A new wiki is coming!

This wiki will be locked for editing on Friday, March 13, 2020, while we upgrade to a new wiki. Here is more information about our migration process.

This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
 

 

 

 

 

 

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction

This tutorial is intended to introduce new users to the FaST-LMM software for GWAS analysis. This Atmosphere image is publicly available under the name FaST-LMM.Py v2.02.   

...

Info

This is a tutorial for FaST-LMM as a distinct Atmosphere image. For FaST-LMM as a step in the Validate Workflow see here.

Table of Contents

About the software

FaST-LMM (Factored Spectrally Transformed Linear Mixed Models) is a GWAS analysis tool designed for large data sets. Normally, running a linear mixed model on a dataset is thorough, but very computationally demanding and may not even work on especially big data sets. FaST-LMM, however, changes things by reducing the runtime needed to produce such a model. Normally, when dealing with SNPs, a genetic similarity matrix is formed. FaST-LMM works by obtaining the spectral decomposition of this similarity matrix without actually computing the matrix itself. This decomposition is then used to test all SNPs in the data set for statistical significance. Such a method allows for proportionally smaller computation time in contrast to other programs. For a more in-depth explanation, see the paper from Microsoft Research here.

Accessing FaST-LMM

To use FaST-LMM via VNC Viewer, follow these simple steps:

  1. Launch a new instance of FaST-LMM.Py v2.02 from Atmosphere, and access it using VNC.
  2. Once you have access to the instance, open up the terminal by either clicking the black icon at the bottom of the screen, or by going to Application > Accessories > Terminal.
  3. Begin coding!

Testing FaST-LMM

The first thing to do is test that the python aspects of the image are working correctly. Where you see "sudo" written into the commands below is where the commands are being performed as the root user.

  1. First, open up your terminal.
  2. Change your directory to the feature_selection folder:

    Code Block
    cd /usr/FaST-LMM-master/fastlmm/feature_selection
    
  3. Run the test.py file using the following code:

    Code Block
    sudo python test.py
    

  4. You should see a large amount of code flash on the screen. This is normal. The whole testing process should take 7-8 minutes, and once it is finished you will see OK at the bottom of the screen.

  5. You are now able to test your data!

Trying out your data

Info

Depending on the size of your dataset, you may wish to use the Stampede system for your data analysis. A FaST-LMM application on Stampede is forthcoming, or you may upload the software manually and use it that way.

...

  • -verboseOutput : use this flag to show more complex and detailed output; does not require a file to be named
  • -extract : A SNP filtering option used in conjunction with FaST-LMM-Select. FaST-LMM will only use SNPs listed in the input file for analysis
  • -pValuePrintThreshold : Restricts the output file to only include SNPs with a p-value less than or equal to the specified threshold

Running example data

  1. To run example data, first change the working directory:

    Code Block
    cd /usr/FaST-LMM-master/fastlmm/feature_selection/examples
  2. Then you can run the command line options for FaST-LMM:

    Code Block
    fastlmmc -verboseOutput -bfile toydata -fileSim toydata -pheno toydata.phe -covar toydata.cov -out ~/Desktop/MyResults.csv -pValuePrintThreshold 0.05
    

Explanation of the code line

  1. fastlmmc – This flag is the main executable
  2. -verboseOutput – This flag triggers verbose mode; gives more detail for output
  3. -bfile – This flag indicate the binary PLINK file set to use in the analysis (BED/BIM/FAM set). DOES NOT INCLUDE FILE EXTENSION.
  4. -fileSim (can also be bfileSim or tfileSim) – This flag indicates the PLINK file set to use for computing the genetic similarity matrix. Can be the same file set as the previous command.  DOES NOT INCLUDE FILE EXTENSION.

...

Info
titleNOTICE

Make sure that you either have all of your data you want to analyze in the same folder as fastlmmc or have specified the correct path to your files!

Additional information

If you want more information on how to run FaST-LMM, documentation can be found here:

...