|
Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to bmlau@email.arizona.edu. Thank you. |
{center}| [!A.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/A%29+Eliminate+small+transcripts] | !RightArrow.jpg! | [!B.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/B%29+Reduce+transcript+redundancy] | !RightArrow.jpg! | [!C.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/C%29+Identify+coding+sequences] | !RightArrow.jpg! | [!D.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/D%29+Rename+transcripts] | !RightArrow.jpg! | [!E.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/E%29+Split+RefSeq+files] | !RightArrow.jpg! | [!F.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/F%29+Map+transcripts] | !LeftArrow(2).jpg! | [!G.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/G%29+Combine+mapping+outputs] | !RightArrow.jpg! | [!H.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/H%29+Identify+best+matches] | !RightArrow.jpg! | [!I.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/I%29+Reformat+Blat+results] | !RightArrow.jpg! | [!J.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/J%29+Annotate+transcripts] | !RightArrow.jpg! | [!K.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/K%29+Map+RNA-Seq+reads+to+transcripts] | !RightArrow.jpg! | [!L.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/L%29+Reformat+mapping+output] | !LeftArrow(2).jpg! | [!M.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/M%29+Count+mapped+reads] | !RightArrow.jpg! | [!N.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/N%29+Trim+count+tables] | !RightArrow.jpg! | [!O.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/O%29+Combine+counts] | !RightArrow.jpg! | [!P.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/P%29+Determine+differential+expression] | !RightArrow.jpg! | [!Q.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/Q%29+Separate+transcripts+by+type] | !RightArrow.jpg! | [!R.jpg!|https://pods.iplantcollaborative.org/wiki/display/TUT/R%29+Generate+transcript+lists] |{center} |
Identify changes in gene expression levels between at least two sequenced transcriptome samples.
Approximate tutorial completion time: 3 hours (Using the pre-computed iPlant sample data from a study in Belgica antarctica (Teets et al., 2012).)
RNA-Seq refers to whole transcriptome sequencing of cDNA, generally using a high-throughput ("next-generation") sequencing technology. RNA-Seq generates deep-coverage information about samples' mRNA. This can be used for a variety of purposes, such as: transcriptome assembly, gene discovery and annotation. RNA-Seq is also used to detect differential transcript abundance between tissues, developmental stages, genetic backgrounds, and environmental conditions.
This RNA-Seq analysis tutorial differs from other RNA-Seq tutorials in that it does not require an assembled reference genome. It still requires an assembled transcriptome however; assembly of transcriptomes is described in other tutorials such as Transcriptome Assembly (de novo) and BLAST a Transcriptome.
The protocol addresses seven primary objectives:
Additional sections consist of reformatting, splitting and combining result files (outputs) from a prior step into the inputs for a subsequent analysis.
The sample/test data is derived from a set of studies, performed by Nicholas Teets and others, with an antarctic flightless midge. The midge, living in an environment where liquid water is unavailable much of the year, must be able to tolerate dehydration in order to survive. The published RNA-Seq studies tested a number of conditions, including dehydration and compared them to control conditions. The RNA-Seq reads are Illumina Genome Analyzer II reads retrieved from the NCBI Sequence Read Archive (SRA) at http://www.ncbi.nlm.nih.gov/sra/?term=Belgica%20antarctica. Because the reads, when tested with FastQC appeared to be already trimmed, no further trimming was done. The reference for the study is Gene expression changes governing extreme dehydration tolerance in an Antarctic insect, Nicholas M. Teets, Justin T. Peyton, Herve Colinet, David Renault, Joanna L. Kelley, Yuta Kawarasaki, Richard E. Lee, Jr, David L. Denlinger, Proc Natl Acad Sci U S A. 2012 December 11; 109(50): 20744--20749.
A. Eliminate small transcripts (app: Select contigs)
B. Reduce transcript redundancy (app: CD-HIT-est 4.6.1)
C. Identify coding sequences (app: Transcript decoder 1.0)
D. Rename transcripts (app: Linux stream editor)
E. Split RefSeq file (app: Split FASTA file)
F. Map transcripts (app: Blat (with options))
G. Combine mapping outputs (app: Concatenate Multiple Files)
H. Identify best matches (app: Best Hit for Blat Output)
I. Reformat Blat results (app: Cut Columns)
J. Annotate transcripts (app: Rename contigs 2.0)
K. Map RNA-Seq reads to transcripts (app: Bowtie-2.2.1--Build-and-Map)
L. Reformat mapping output (app: SAM to sorted BAM)
M. Count mapped reads (app: Index BAM and get stats)
N. Trim count tables (app: Cut Columns)
O. Combine counts (app: Join multiple tab-delimited files)
P. Determine differential expression (app: DESeq)
Q. Separate transcripts by type (app: Numeric Evaluation of a Data Column)
R. Generate transcript lists (app: Cut Columns)
Approximate analysis durations for the iPlant sample data are provided in each step. With other data sets, depending on size, it could take less or more time. Using the sample data, users can skip through the workflow (a la 'cooking show'), returning later to examine the results of their own analysis.
REFERENCE:
Sample data: Nicholas M. Teets, Justin T. Peyton, Herve Colinet, David Renault, Joanna L. Kelley, Yuta Kawarasaki, Richard E. Lee, Jr, David L. Denlinger, Proc Natl Acad Sci U S A. 2012 December 11; 109(50): 20744--20749.