The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.

Rationale and background:

rnaQUAST is a tool for evaluating RNA-Seq assemblies using reference genome and gene data database. In addition, rnaQUAST is also capable of estimating gene database coverage by raw reads and de novo quality assessment using third-party software. The following tutorial is denovo based quality assessement of transcripts using rnaQUAST 1.2.0. If you have reference genome you can use reference based rnaQUAST 1.2.0 app.

Pre-Requisites:

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)

  2. Input/Outputs 
    1. Transcript file(s) in FASTA format (Mandatory)
    2. Output directory to store all results.
  3. Options/Parameters
    1.  Run with GeneMarkS-T gene prediction tool. Use `--prokaryote` option if the genome is prokaryotic. Eukaryote is default.
    2. Run BUSCO tool, which detects core genes in the assembly. BUSCO lineage data (Eukaryota, Metazoa, Arthropoda, Vertebrata or Fungi). You can select the BUSCO profile files for your species of interest from here : /iplant/home/shared/iplantcollaborative/example_data/BUSCO.sample.data
    3. Run disable_infer_genes option if your GTF file already contains genes records, otherwise gffutils will fix it. Note that gffutils may work for quite a long time.
    4. Run disable_infer_transcripts if your GTF file already contains transcripts records, otherwise gffutils will fix it. Note that gffutils may work for quite a long time.
    5. Name(s) of assemblies that will be used in the reports separated by space and given in the same order as files with transcripts / alignments.
    6.  Set if transcripts were assembled using strand-specific RNA-Seq data in order to benefit from knowing whether the transcript originated from the + or - strand.
    7.  Do not draw plots (makes rnaQUAST run a bit faster).

Test/sample data:

The following test data are provided for testing rnaQUAST 1.2.0 in here - /iplant/home/shared/iplantcollaborative/example_data/rnaQUAST.sample.data:

  1. idba.fasta

  2. spades.311.fasta and 

  3. Trinity.fasta 

     

de novo quality assessment:

a. Using rnaQUAST 1.1.0 tool with GeneMarkS-T 

  1. Input file(s): idba.fasta, spades.311.fasta and Trinity.fasta (transcript files)

  2. Output folder name - rnaQUAST_output_GM

and leave the rest of the options as default

b. Using rnaQUAST 1.1.0 tool with BUSCO 

  1. Input file(s): idba.fasta, spades.311.fasta and Trinity.fasta (transcript files)

  2. lineage data - Select the BUSCO profile folder "arthropoda" from here : /iplant/home/shared/iplantcollaborative/example_data/BUSCO.sample.data
  3. Output folder name - rnaQUAST_output_arthropoda_BUSCO

and leave the rest of the options as default

Output Reports

The following text files with reports are contained in comparison_output directory and include results for all input assemblies. In addition, these reports are contained in<assembly_label>_output directories for each assembly separately.

basic_mertics.txt 
Basic transcripts metrics are calculated without reference genome and gene database.

BUSCO metrics. The following metrics are calculated only when --busco and --clade options are used (see options for details).

GeneMarkS-T metrics. The following metrics are calculated when reference and gene database are not provided.

alignment_metrics.txt 
Alignment metrics are calculated with reference genome but without using gene database. To calculate the following metrics rnaQUAST filters all short partial alignments (see --min_alignmentoption) and attempts to select the best hits for each transcript.

Number of assembled transcripts = Unaligned + Aligned = Unaligned + (Uniquely aligned + Multiply aligned + Misassembly candidates reported by GMAP (or BLAT)).

Alignment metrics for non-misassembled transcripts

misassemblies.txt 

sensitivity.txt 
Assembly completeness (sensitivity). For the following metrics (calculated with reference genome and gene database) rnaQUAST attempts to select best-matching database isoforms for every transcript. Note that a single transcript can contribute to multiple isoforms in the case of, for example, paralogous genes or genomic repeats. At the same time, an isoform can be covered by multiple transcripts in the case of fragmented assembly or duplicated transcripts in the assembly.

specificity.txt 

Assembly specificity. To compute the following metrics we use only transcripts that have at least one significant alignment and are not misassembled.

Plots

The following plots are similarly contained in both comparison_output directory and <assembly_label>_output directories. Please note, that most of the plots represent cumulative distributions and some plots are given in logarithmic scale.