This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA 

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.
 
Thank You,
CyVerse Staff 

 

 

 

 

 

 

Skip to end of metadata
Go to start of metadata


GEMMA

What is GEMMA?

GEMMA (Genome-wide Efficient Mixed Model Association) is an analysis tool designed primarily for linear mixed models and variations thereof. More specifically, GEMMA handles three types of mixed models: a linear mixed model for marker associations with a single phenotype, a multivariate linear mixed model for testing marker associations with multiple phenotypes, and a Bayesian sparse linear mixed model for estimating PVE by typed genotypes, predicting phenotypes, and identifying associated markers.

How to Get Started

Like the FaST-LMM program above, the GEMMA software is also located in the /usr/bin directory, so the gemma executable can be called from anywhere on the computer. To see all possible inputs for GEMMA, type gemma –h into the command line.

To run a basic mixed model analysis with GEMMA, you will need your inputs in either PLINK binary format (BED/BIM/FAM extensions) or BIMBAM formats (with mean genotype, phenotype, and an optional annotation file). Once you have your data in this format, GEMMA also requires a relatedness matrix to run the mixed model; however, GEMMA has a relatedness matrix calculation algorithm built in.

Calculating a Relatedness Matrix

With PLINK binary files

Type your commands in as shown:

These file options indicate:

  • -bfile : A character string.The name of the PLINK binary file set, given without the extension. For example, if your files are names dat.bed, dat.bim, and dat.fam, you would just type in dat after the bfile flag.

  • -gk: An integer, either 1 or 2. Tells GEMMA the type of relatedness matrix to calculate. Option 1 calculates the centered relatedness matrix, while option 2 calculates the standardized relatedness matrix.

  • -o: A character string. Your designated name for the analysis output.

With BIMBAM files

Type your commands in as shown:

These file options indicate:

  • -g: A character string. Indicates the mean genotype file in your set. The full name, including extension, is required.

  • -p: A character string. Indicates the phenotype file for your set. Again, the full name is required.

  • -gk and -o options are the same as above.

Once the relatedness matrix algorithm is finished, GEMMA will create a folder in the current directory called output. Here, you will find the relatedness matrix file: <output name>.CXX.txt.

Running univariate and multivariate analysis

Now that you have a relatedness matrix, you can use that file in either a univariate mixed model analysis or a multivariate mixed model analysis.

Univariate with PLINK binary files

Type out your commands in the terminal like so:

These file options indicate:

  • -bfile: A character string. The PLINK binary file set name. Like previously, only the prefix is required; do not type any of the extensions in for this option

  • -k: A character string. The name of the previously calculated relatedness matrix file. Full name, including file extension, is required.

  • -lmm: An integer between 1 and 4 inclusive. Specifies which frequentist test to use and which corresponding p-value to list in the output. Option 1 gives the Wald test, option 2 gives the likelihood ratio test, option 3 gives the score test, and option 4 performs all three tests.

  • -o: Character string. Specifies your desired output prefix for the analysis file.

Once the analysis is complete, check the output folder in your current directory for your mixed model output: .assoc.txt.

Univariate with BIMBAM files

These file options indicate:

  • -g: Character string; the mean genotype file for your fileset. The full filename, including extension, is required

  • -p: Character string; the phenotype file for your set. Again, the full name is required.

  • -a: Character string; the annotation file of the set (optional)

  • -k: A character string. The name of the previously calculated relatedness matrix file. Full name, including file extension, is required.

  • -lmm: An integer between 1 and 4 inclusive. Specifies which frequentist test to use and which corresponding p-value to list in the output. Option 1 gives the Wald test, option 2 gives the likelihood ratio test, option 3 gives the score test, and option 4 performs all three tests.

  • -o: Character string. Specifies your desired output prefix for the analysis file.

Multivariate

For the multivariate mixed model, the only additional command line argument required is:

   -n: One or more integers separated by spaces; indicates which phenotype values in the phenotype file (whether PLINK binary or BIMBAM format) are included in the association analysis.

Further Information

User Manual: GEMMA_user_manual.pdf

Developer’s Website: http://www.xzlab.org/software.html

Example Data is the same as PLINK data and can be found:  Example data

Icon

This tool is still in development and we are testing it currently. If you notice any issues or have any comments we would greatly appreciate them!
Please contact us at labstapleton@gmail.com. Thank you for using our tools!

  • No labels