This tutorial is intended to introduce new users to the FaST-LMM software for GWAS analysis. This Atmosphere image is publicly available under the name FaST-LMM.Py v2.02.
All of the necessary Python modules are already installed on this instance, so you can get started analyzing right away!
This is a tutorial for FaST-LMM as a distinct Atmosphere image. For FaST-LMM as a step in the Validate Workflow see here.
Learn about CyVerse's allocation policies here.
FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a GWAS analysis tool designed for large data sets. Normally, running a linear mixed model on a dataset is thorough, but very computationally demanding and may not even work on especially big data sets. FaST-LMM, however, changes things by reducing the runtime needed to produce such a model. Normally, when dealing with SNPs, a genetic similarity matrix is formed. FaST-LMM works by obtaining the spectral decomposition of this similarity matrix without actually computing the matrix itself. This decomposition is then used to test all SNPs in the data set for statistical significance. Such a method allows for proportionally smaller computation time in contrast to other programs. For a more in-depth explanation, see the paper from Microsoft Research here.
To use FaST-LMM via VNC Viewer, follow these simple steps:
The first thing to do is test that the python aspects of the image are working correctly. Where you see "sudo" written into the commands below is where the commands are being performed as the root user.
Change your directory to the feature_selection folder:
Run the test.py file using the following code:
sudo python test.py
You should see a large amount of code flash on the screen. This is normal. The whole testing process should take 7-8 minutes, and once it is finished you will see OK at the bottom of the screen.
Depending on the size of your dataset, you may wish to use the Stampede system for your data analysis. A FaST-LMM application on Stampede is forthcoming, or you may upload the software manually and use it that way.
One thing to keep in mind is that the actual FaST-LMM program is a C-based file. The Python-compatible functions from FaST-LMM, while important, are technically extensions of this main program. The most important Python-compatible functions are as follows:
If you wish to try out any of these Python functions, please consult the documentation located in the usr directory under FaST-LMM-Docs. This document can be accessed in the image by minimizing the terminal and accessing through the file path.. File manager > usr > Fast-Lmm-Docs.
The remainder of this tutorial will be focused strictly on the main FaST-LMM program and its output(s).
The main program may be accessed simply by typing in fastlmmc in the terminal as it is saved into the usr/bin/ folder.
FaST-LMM uses four primary input files:
The first two SNP files must be in PLINK format? (PED/MAP, BED/BIM/FAM, TPED/TFAM). The input flags you can use are as follows:
These are the bare minimum options needed to run FaST-LMM; however, some other options for considerations are...
To run example data, first change the working directory:
Then you can run the command line options for FaST-LMM:
fastlmmc -verboseOutput -bfile toydata -fileSim toydata -pheno toydata.phe -covar toydata.cov -out ~/Desktop/MyResults.csv -pValuePrintThreshold 0.05
fileSim – means using a PED/MAP set
bfileSim – means using a BED/BIM/FAM set
tfileSim – means using a TPED/TFAM set
5. -pheno – This flag indicates the phenotype file for the file set to the analysis. THIS DOES INCLUDE FILE EXTENSION.
6. -covar – This flag indicates the covariate file and is optional. THIS DOES INCLUDE FILE EXTENSION.
7. -out – This flag indicates the name and location of the output files. The file extension will automatically output to txt unless otherwise specified, .CSV is more efficient for later certain data analysis though.
8. -pValuePrintThreshold – This flag tells the program to print only p values < 0.05 and is optional.
This will output to the desktop on your atmosphere image and the output will look like this.
Make sure that you either have all of your data you want to analyze in the same folder as fastlmmc or have specified the correct path to your files!
If you want more information on how to run FaST-LMM, documentation can be found here:
and demo data can be found here: