Determines differential expression analysis on count based expression data sets.
DESeq estimates differentially expressed gene lists based on a negative binomial distribution model. Previous methods for identifying differentially expressed gene lists assumed a Poisson distribution, however Poisson does not account for variation (or overdispersion) found in expression data. DESeq uses a negative binomial distribution (similar to edgeR), assuming variance in the case of few replicates.
The input is a tab-delimited file containing genes and their expression values. The results include files detailing the results of differential expression testing (one that includes all of the results, and one that only includes the results that exceed a minimum false-discovery rate). Also included for visualization purposes are plots of the estimated dispersions, the log fold changes against the mean normalized counts and a histogram of p-values. The plots are purely for visualization purposes and may not be necessary for all users.
Anders S1, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.
- To use DESeq, your input file must be tab-delimited. You must also know the library type (either "single-end" or "paired-end") for each column in your input file.
- Resources: http://bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> DESeq.
Use DESeq_test_data.tsv from the directory above as test input. This example data comes from the RNASeq Drosophilia example used in the DESeq paper. This was unpublished data B. Wilczynski, Y.-H. Liu, N. Delhomme and E. Furlong.
Parameters Used in App
When the app is run in the Discovery Environment, use the following parameters with the above input file(s) to get the output provided in the next section below.
- Tab-delimited input: DESeq_test_data.tsv from the directory listed above.
- Column where feature names are found: 1
- Comma-separated list of factors for the data columns in your file: untreated,untreated,untreated,untreated,treated,treated,treated
- Comma-separated list of library types for each factor listed above: single-end,single-end,paired-end,paired-end,single-end,paired-end,paired-end
- Comma-separated pair of factor for comparison: untreated,treated
- Minimum false-discovery rate: 0.1
- Quantile for removing insignificant genes: 0.4
For the test case, the output files you will find in the example_data directory are:
DESeq_Dispersion.png - plot of the estimated dispersion
DESeq_MAPlot.png - plot of log fold changes against the mean normalized counts
DESeq_pValues.png - histogram of p-values