The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
IUTA: Isoform Usage Two-step Analysis
Liang Niu, Weichun Huang, David M Umbach and Leping Li. BMC Genomics 201415:862
IUTA is an analysis tool of Illumina paired-end RNA-seq data for detecting differential usage of gene transcript isoforms. It first uses the EM algorithm to identify the usage of transcript isoforms for each gene, then tests the difference in the isoform usage between two groups based on the method for composition data analysis. IUTA takes RNA-Seq alignment files (in BAM format) from two groups of samples, together with a gene annotation file (in GTF format) for the related species, to test for differential isoform usage (set of relative abundances of isoforms) for each of the inquired genes. It outputs two tab-delimited files (with header): “estimates.txt” and “p_values.txt”.
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- GTF: Gene annotation file (in GTF format) for the related species
- Bam_1: Folder containing BAM files for the replicates of samples in group one
- Bam_2: Folder containing BAM files for the replicates of samples in group two
- FLD: Whether to use "empirical" FLD or "normal" FLD. If it is "empirical" (default), the EFLD is used and it is estimated from the data. If it is "normal", a discrete normal distribution is used as FLD. In the latter case, user can specify the mean and the standard deviation (sd) via mean.FL.normal and sd.FL.normal; if user does not specify the mean or/and the standard deviation of the normal FLD, the corresponding estimate(s) from the raw EFLD (i.e., before smoothing) will be used
- Test.type: A character vector consists of the test types that the user wants to use for testing differential isoform usage in IUTA. Three types of test are available: "SKK" (default), "CQ" and "KY". The character vector is composed using the three test types, e.g., c("SKK","CQ"), or c("CQ","SKK","KY")
- Mandatory output
- Output folder name: Output directory name (default is IUTA_output)
- Pie compare and Barplot charts
- Number of samples: Number of samples in the first group (default is 3)
- Gene name: Name of the gene name.
- Pie and bar plots will be generated based on the provided gene name.
- If this option is left blank, all the genes in the estimates.txt file will be used for generating compressed pie and bar plots.
- Group name: A character vector of the names of the two groups. The first (second) element is the name of the first (second) group. The default names are "1" and "2". Examples (1,2; Sample1,Sample2; etc)
The test data for testing IUTA v1.0 is found in here : /iplant/home/shared/iplantcollaborative/example_data/IUTA.sample.data/
- Open IUTA-1.0 app in DE
- Select/drag input file (mm10_kg_sample_IUTA.gtf) and folders (Bam_list_1 and Bam_list_2) into the Inputs section of the IUTA-1.0 app
- Leave the default the name of the output folder name (IUTA_output)
- Leave the default parameter (empirical) for FLD. For Test type, use SKK,CQ,KY
- Leave the default number of samples (3), groups("1","2"), and add Pcmtd1 for the Gene name or leave Gene name text box blank for IUTA to assess all genes in estimate.txt
- Click launch analysis
Successful execution of the IUTA assessment pipeline will create the following output in the IUTA_output folder
|estimates.txt||The tab-delimied text file with path estimates.file (with header) should contain 2 + n1 (number of samples in first group) + n2 (number of samples in second group) columns: the first two columns are the gene name (column 1) and the isoform (column 2); the next n1 columns are the estimates of the relative isoform abundance of the isoform from samples in group one; the last n2 columns are the estimates of the relative isoform abundance of the isoform from samples in group two|
|p-value.txt||A table with 3+1+1+(m− 1)+ 1 columns, where m is the number of tests in test type. The first three columns are “gene” (gene name), “number_of_isoform” (number of isoforms of the gene), “test_sample_size” (number of samples of each group in which the isoform usage can be estimated, separated by comma). The fourth column is “test”, which is the type of test used to calculate the next column “p_value” (either the first test type in test.type, or NA when the test outputs NA). The fifth column is “p_value”, which is the output p-value for the gene by the test in column “test”. The next m − 1 columns corresponding to the p-values by the tests in test.type except the first type of test in test.type|
And the following figures are generated...