This tutorial is intended to introduce new users to the FaST-LMM software for GWAS analysis. This Atmosphere image is publicly available under the name FaST-LMM.Py v2.02.
This is a tutorial for FaST-LMM as a distinct Atmosphere image. For FaST-LMM as a step in the Validate Workflow see here.
|Table of Contents|
About the software
FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a GWAS analysis tool designed for large data sets. Normally, running a linear mixed model on a dataset is thorough, but very computationally demanding and may not even work on especially big data sets. FaST-LMM, however, changes things by reducing the runtime needed to produce such a model. Normally, when dealing with SNPs, a genetic similarity matrix is formed. FaST-LMM works by obtaining the spectral decomposition of this similarity matrix without actually computing the matrix itself. This decomposition is then used to test all SNPs in the data set for statistical significance. Such a method allows for proportionally smaller computation time in contrast to other programs. For a more in-depth explanation, see the paper from Microsoft Research here.
To use FaST-LMM via VNC Viewer, follow these simple steps:
- Launch a new instance of FaST-LMM.Py v2.02 from Atmosphere, and access it using VNC.
- Once you have access to the instance, open up the terminal by either clicking the black icon at the bottom of the screen, or by going to Application > Accessories > Terminal.
- Begin coding!
The first thing to do is test that the python aspects of the image are working correctly. Where you see "sudo" written into the commands below is where the commands are being performed as the root user.
- First, open up your terminal.
Change your directory to the feature_selection folder:
Run the test.py file using the following code:
sudo python test.py
You should see a large amount of code flash on the screen. This is normal. The whole testing process should take 7-8 minutes, and once it is finished you will see OK at the bottom of the screen.
- You are now able to test your data!
Trying out your data
Depending on the size of your dataset, you may wish to use the Stampede system for your data analysis. A FaST-LMM application on Stampede is forthcoming, or you may upload the software manually and use it that way.
- -verboseOutput : use this flag to show more complex and detailed output; does not require a file to be named
- -extract : A SNP filtering option used in conjunction with FaST-LMM-Select. FaST-LMM will only use SNPs listed in the input file for analysis
- -pValuePrintThreshold : Restricts the output file to only include SNPs with a p-value less than or equal to the specified threshold
Running example data
To run example data, first change the working directory:
Then you can run the command line options for FaST-LMM:
fastlmmc -verboseOutput -bfile toydata -fileSim toydata -pheno toydata.phe -covar toydata.cov -out ~/Desktop/MyResults.csv -pValuePrintThreshold 0.05
Explanation of the code line
- fastlmmc – This flag is the main executable
- -verboseOutput – This flag triggers verbose mode; gives more detail for output
- -bfile – This flag indicate the binary PLINK file set to use in the analysis (BED/BIM/FAM set). DOES NOT INCLUDE FILE EXTENSION.
- -fileSim (can also be bfileSim or tfileSim) – This flag indicates the PLINK file set to use for computing the genetic similarity matrix. Can be the same file set as the previous command. DOES NOT INCLUDE FILE EXTENSION.
Make sure that you either have all of your data you want to analyze in the same folder as fastlmmc or have specified the correct path to your files!
If you want more information on how to run FaST-LMM, documentation can be found here: