This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
Skip to end of metadata
Go to start of metadata
Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to kchougul@cshl.edu. Thank you.

Rationale and background:

 

StringTie enables improved reconstruction of a transcriptome from RNA-seq reads

Mihaela Pertea,    Geo M Pertea,    Corina M Antonescu,    Tsung-Cheng Chang,    Joshua T Mendell    & Steven L Salzberg

 doi:10.1038/nbt.3122

 

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantitate full-length transcripts representing multiple splice variants for each gene locus. Its input can include not only the alignments of raw reads used by other transcript assemblers, but also alignments longer sequences that have been assembled from those reads.In order to identify differentially expressed genes between experiments, StringTie's output can be processed by specialized software like Ballgown, Cuffdiff or other programs (DESeq2, edgeR, etc.)


Version: 1.3.3


 

The app expects the GTF files produced by StringTie to be located inside each sample sub-directory located in the main output directory. It generates two CSV files containing the count matrices for genes and transcripts, using the coverage values found in the output of stringtie -e. This output can be used in differential expression analysis tool like DESeq2 and edegeR


Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments -
    1. the parent directory of the sample sub-directories: (in gtf format)
  3. Optional arguments
    1. the average read length: 75 (the average read length)
    2. cluster genes that overlap with different gene IDs: uncheck(whether to cluster genes that overlap with different gene IDs  )
Test/sample data 

The following test data are provided for testing StringTie-1.3.3_to_DESeq2_and_edegeRin here - /iplant/home/shared/iplantcollaborative/example_data/StringTie/StringTie-1.3.3_to_DESeq2_and_edegeR:

  1. Directory of ballgown outputfiles  files -
    1.  ballgown_input_files

Run StringTie-1.3.3_to_DESeq2_and_edegeR on gtf file in the ballgown directory.

Results 

Successful execution of the StringTie-1.3.3_to_DESeq2_and_edegeR will contain several files and directories.  It generates two CSV files containing the count matrices for genes and transcripts, using the coverage values found in the output of stringtie -e:

  1. gene_count_matrix.csv : gene count matrix
  2. transcript_count_matrix.csv- transcript count matrix


These count matrices (CSV files) can then be imported into R for use by DESeq2 and edgeR (using the DESeqDataSetFromMatrix and DGEList functions, respectively).

More information on the tool can be found here - https://ccb.jhu.edu/software/stringtie/index.shtml?t=manual


  • No labels