This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA 

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.
 
Thank You,
CyVerse Staff 

 

 

 

 

 

 

Skip to end of metadata
Go to start of metadata


FaST-LMM

What is FaST-LMM?

FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a GWAS analysis tool from Microsoft Research designed for large data sets.  It has been tested on data sets with over 120,000 individuals. Normally, running a linear mixed model on a dataset is thorough, but computationally demanding and may not even work on especially big data sets. FaST-LMM changes things by reducing the runtime needed to produce such a model. Normally, when dealing with SNPs, a genetic similarity matrix is formed. FaST-LMM works by obtaining the spectral decomposition of this similarity matrix without actually computing the matrix itself. This decomposition is then used to test all SNPs in the data set for statistical significance. Such a method allows for proportionally smaller computation time in contrast to other programs.

Icon

This tutorial relates specifically to FaST-LMM as run through a Validate Workflow instance. For information about the FaST-LMM image see here.

Icon
FaST-LMM can be computationally demanding, and if an instance lacks sufficient memory, the process may be "killed" mid computation. In our experience, filesets in excess of 1GB need at least 4GB of memory to guarantee processing.


How to Get Started

To run the FaST-LMM executable, simply type fastlmmc into the command line. This will bring up a help menu with all the possible input options.

Input files

  1. A PEDMAP set of files

  2. A phenotype file corresponding to the PEDMAP set

  3. A set of PLINK formatted files to compute the genetic similarity matrix decomposition. This does not need to be different from number 1

  4. A set of corresponding covariates (optional)

Input flags

  • -file : Denotes the file name for the PLINK .ped/.map files

  • -bfile : Denotes the name for PLINK .bed/.bim/.fam files

  • -tfile : Denotes the name for PLINK .tped/.tfam files

Note: As the filenames for the input files are the same, only one name is required as long as the correct flag type is used (i.e. bfile, file, tfile)

  • -pheno : Denotes the name of the phenotype file (including extension)  

  • -out : The name of your final output file, which is placed into the same directory as your program and data unless otherwise specified

  • -fileSim:  The name of the PLINK set used for computing the genetic similarity matrix and its decomposition (will be the same as the fileset used, hence, file extension not necessary)

Additional options

  • -verboseOutput : use this flag to show more complex and detailed output; does not require a file to be named  

  • -covar : Denotes the name of the covariate file (including file extension)
  • -pValuePrintThreshold : Restricts the output file to only include SNPs with a p-value less than or equal to the specified threshold

Example command line for running FaST-LMM

Further Information

User Manual: http://nbviewer.ipython.org/github/MicrosoftGenomics/FaST-LMM/blob/master/doc/ipynb/FaST-LMM.ipynb

Paper on FaST-LMM from Microsoft Research: http://www.nature.com/nmeth/journal/v8/n10/abs/nmeth.1681.html

Source code from Microsoft Research Github: https://github.com/MicrosoftGenomics/FaST-LMM

Example Input and Output data can be found as an attachment to this page.

Icon

This tool is still in development and we are testing it currently! If you notice any issues or have any comments we would greatly appreciate them!
Please contact us at labstapleton@gmail.com. Thank you for using our tools!

  • No labels