The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Panel
bgColor#FFFFCE
titleBGColor#F7D6C1
borderStyledashed
titleAlert:
borderColor#ccc

Image Removed

 

The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').

Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need an app reinstated, please contact support@cyverse.org.

Panel
bgColor#FFFFCE
titleBGColor#F7D6C1
borderStyledashed
titleTutorial under review
borderColor#ccc

For an introduction to using the DE, see Using the Discovery Environment.

Please work through the tutorial and add your comments on the bottom of this page, or email comments to upendra@cyversesupport@cyverse.org. Thank you.

Rationale and background

Currently, DeSeq apps (both DeSeq and DeSeq2) in DE, do not allow multifactorial pairwise comparison of RNA-Seq data for differential gene expression analysis. The app - "DEseq2 (multifactorial pairwise comparisons)" is based on SARTools (R package dedicated to the differential analysis of RNA-seq data) which allows multifactorial pairwise comparison of RNA-Seq data for differential gene expression analysis. It provides tools to generate descriptive and diagnostic graphs, to run the differential analysis with the DEseq2 package, and to export the results into easily readable tab-delimited files. It also facilitates the generation of an HTML report which displays all the figures produced, explains the statistical methods, and gives the results of the differential analysis. 

...

Info
titleNote

SARTools does not intend to replace edgeR: it simply provides an environment to go with them. For more details about the methodology behind edgeR, the user should read their documentations documentation and papers. In addition, the current app is not intended to perform edgeR's GLM. That version is currently under progress.

 


Introduction and Overview

...

    1. Target file: The user has to supply a tab-delimited file which describes the experiment, i.e. which contains the name of the biological condition associated with each sample. This file is called "target". This file has one row per sample and is composed of at least three columns with headers:

      • first column: unique names of the samples (short but informative as they will be displayed on all the figures); (Ex: "label")
      • second column: name of the count files; (Ex: "file")
      • third column: biological conditions; (Ex: "group)
      • optional columns: further information about the samples (day of library preparation for example). (Ex: "cond")

      The table below shows an example of a target file:

      labelfilegroupcond
      5_OP_1count3.txtOP5
      5_OP_2count3.txtOP5
      5_OP_3count3.txtOP5
      33_OP_1count3.txtOP33
      33_OP_2count3.txtOP33
      33_OP_3count3.txtOP33
      5_M_1count3.txtM5
      5_M_2count3.txtM5
      5_M_3count3.txtM5
      33_M_1count3.txtM33
      33_M_2count3.txtM33
      33_M_3count3.txtM33
      5_LL_1count3.txtLL5
      5_LL_2count3.txtLL5
      5_LL_3count3.txtLL5
      33_LL_1count3.txtLL33
      33_LL_2count3.txtLL33
      33_LL_3count3.txtLL33
    2. Raw counts file or Raw counts folder: The edgeR statistical analysis assumes that reads have already been mapped and that counts per feature (gene or transcript) are available. There are a couple of ways to provide the option to the app. Either the user has to supply a raw counts file that contains all the samples, each tab corresponds to a sample with gene/transcript the same or a directory consisting of one count file per sample with two tab tabs delimited columns without a header:
      • the unique IDs of the features in the first column;
      • the raw counts associated with these features in the second column (null or positive integers).

...

Warning
titleNote

The user should provide the same number of read files inside a directory corresponding to the number of rows in the target file.If the counts and the target files are not supplied in the required formats, the app will not work and you will not be able to run the analysis.

...

    • Project name: name of the project (must be supplied by the user);

    • Author Name: author of the analysis (must be supplied by the user);

    • Reference biological condition: reference biological condition used to compute fold-changes (no default, must be one of the levels target file);

    • batch: adjustment variable to use as a batch effect, must be a column of the target file ("day" for example, orNULL if no batch effect needs to be taken into account);

    • Variable of Interest: variable of interest, i.e. biological condition, in the target file (Mandatory. "group" by default);

    • FeaturesToRemove: character vector containing the IDs of the features to remove before running the analysis (default is is,"alignment_not_unique"). Other available features are"ambiguous""no_feature""not_aligned""too_low_aQual" to remove HTSeq-count specific rows);

    • locfunc: function used for the estimation of the size factors (default is is,"median", or "shorth"from the genefilter` gene filter` package);
    • Transfomation method for PCA/clustering: method of transformation of the counts for the clustering and the PCA (default is"VST" for Variance Stabilizing Transformation, or "rlog" for Regularized Log Transformation);
    • Mean-variance relationship: type of model for the mean-dispersion relationship ("parametric" by default, or"local");
    • Independent Filtering: TRUE (default) of or FALSE to execute or not the independent filtering;
    • cooksCutoff: TRUE (default) of or FALSE to execute or not the detection of the outliers;
    • Significance threshold: significance threshold applied to the adjusted p-values to select the differentially expressed features (default is 0.05);

    • p-value adjustment method: p-value adjustment method for multiple testing ("BH" by default, "BY" or any value of p.adjust.methods);

    • colors: colors used for the figures (one per biological condition)

...

This tutorial uses the test data that is stored in the Data Store at Community Data > iplantcollaborative > example_data > DESeq2_multi.          

Starting

...

a DESeq2 (multifactorial pairwise comparisons) job in the DE

Open the DE Apps window and search for edgeR (multficatorial multifactorial pairwise comparisons).

In the Analysis Name:

...

The following files and figures are generated

  • barplotTC.png: the total number of reads per sample;
  • barplotNull.png: percentage of null counts per sample;
  • densplot.png: estimation of the density of the counts for each sample;
  • majSeq.png: percentage of reads caught by the feature having the highest count in each sample;
  • pairwiseScatter.png: pairwise scatter plot between each pair of samples and SERE values (not produced if more than 30 samples);
  • diagSizeFactorsHist.png: diagnostic of the estimation of the size factors;
  • diagSizeFactorsTC.png: plot of the size factors vs the total number of reads;
  • countsBoxplot.png: boxplots on raw and normalized counts;
  • cluster.png: hierachical hierarchical clustering of the samples (based on VST or rlog data for DESeq2);
  • PCA.png: first and second factorial planes of the PCA on the samples based on VST or rlog data;
  • dispersionsPlot.png: graph of the estimations of the dispersions and diagnostic of log-linearity of the dispersions;
  • rawpHist.png: histogram of the raw p-values for each comparison;
  • MAplot.png: MA-plot for each comparison (log ratio of the means vs intensity);
  • volcanoPlot.png: vulcano volcano plot for each comparison ($-\log_{10}\text{(adjusted P value)}$ vs log ratio of the means).


Some tab-delimited files are exported in the the directory tables directory. They store information on the features as $\log_2\text{(FC)}$ or p-values and can be read easily in a spreadsheet:

...