The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata
Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to kchougul@cshl.edu. Thank you.

Rationale and background:

 

Salmon: Accurate, Versatile and Ultrafast Quantification from RNA-seq Data using Lightweight-Alignment()

 (doi:10.1038/nbt.2862)

 

Rob Patro, Geet Duggal, Carl Kingsford (2015)

Salmon is a tool for wicked-fast transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify. All you need to run Salmon is a FASTA file containing your reference transcripts and a (set of) FASTA/FASTQ file(s) containing your reads. Optionally, Salmon can make use of pre-computed alignments (in the form of a SAM/BAM file) to the transcripts rather than the raw reads.


Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Transcript file name (in fasta format)
    2. FASTQ files ( Paired or Single end reads in Fastq or fastq.gz format) 
    3. Read File type (Enter whether the library is Paired or Single end reads )
    4. Library- the type of sequenicng library, leave default to A if not sure else read the doc for value to enter: http://salmon.readthedocs.io/en/latest/salmon.html#what-s-this-libtype
  3. Optional arguments
    1. Number of bootstraps ( This option takes a positive integer that dictates the number of bootstrap samples to compute. The more samples computed, the better the estimates of varaiance, but the more computation (and time) required)
    2. Number of GibbsSamples (this option produces samples that allow us to estimate the variance in abundance estimates. However, in this case the samples are generated using posterior Gibbs sampling over the fragment equivalence classes rather than bootstrapping)

Test/sample data 

The following test data are provided for testing Sailfish_align_quant-0.9.2 in here - :

  1. Transcript file - transcripts.fa
  2. FASTQ files - reads_1.fq and reads_2.fq

Run salmon-index-quant-0.8.1 on FASTQ files (reads_1.fq and reads_2.fq) using ‘transcripts.fa'.

Results 

Successful execution of the salmon-index-quant will create a directory named reads_1. The directory will contain several files and directories:

  1. logs
  2. Index
  3. reads_1
    1. quant.sfWhen the quantification step is finished, the directory <quant_dir> will contain a file named “quant.sf” (and, if bias correction is enabled, an additional file names “quant_bias_corrected.sf”). This file contains the result of the Sailfish quantification step. This file contains a number of columns (which are listed in the last of the header lines beginning with ‘#’). Specifically, the columns are (1) Transcript ID, (2) Transcript Length, (3) Transcripts per Million (TPM) and (6) Estimated number of reads (an estimate of the number of reads drawn from this transcript given the transcript’s relative abundance and length).

More information on the tool can be found here - http://salmon.readthedocs.io/en/latest/salmon.html


  • No labels