This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
Skip to end of metadata
Go to start of metadata

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org. Thank you.

Rationale and background:

RMTA is a workflow that can rapidly process raw RNA-seq Illumina data by mapping reads using HiSat2 and then assemble transcripts using either Cufflinks or Stringtie. RMTA can process Fastq files containing paired-end or single-end reads. Alternatively, RMTA can directly process one or more sequence read archives (SRA) from NCBI using an SRA ID.

RMTA minimally requires the following input data:

  1. Reference Genome (FASTA) or Hisat2 Indexed Reference Genome (in a subdirectory)
  2. Reference Transcriptome (GFF3/GTF/GFF)
  3. RNA-Seq reads (FASTQ) - Single end or Paired-end or NCBI SRA id or multiple NCBI SRA id's (list in a single column text file).


Pre-Requisites

  1. A CyVerse account. (Register for a CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Hisat2 reference genome: Select at least one of the below three options for the indexing of the Reference Genome
      1. Custom Reference genome
      2. Select reference genome from the list
      3. Hisat2 Indexed folder
    2. Hisat2 reference annotation: Select at least one of the below two options for using as annotation
      1. Custom Reference annotation
      2. Select reference annotation from the list

      Use one of the following three:
    3. Paired-end reads
      1.  FASTQ Files (Read 1): Input reads 1 file of paired-end data 
      2. FASTQ Files (Read 2): Input reads 2 files of paired-end data
    4. Single-end reads
      1. single end FASTQ files
    5. SRA
      1. SRA ID: Single SRA id that you want to use
      2. File containing SRA id's: Multiple SRA's that you want to use
    6. Cufflinks/Stringtie:  Only one of the below two options needs to be checked. Cannot select both
      1. StringTie
      2. Cufflinks
      3. Coverage cut-off threshold: Select from 0-5
      4. FPKM cut-off threshold: FPKM cut-off you want to use to filter the transcripts
    7. Cuffmerge: Run Cuffmerge for Stringtie/Cufflinks gtfs (Only works with more than one sample files)
  3. Advanced options
    1. Phred quality score: encoding for quality score: Phread64 (Default is Phred 33)
    2. Fragment Library Type: specify the format of the library either FR, RF, F, R etc.
    3. Trim bases from 5' end of read: Trim bases from 5' (left) end of each read before alignment
    4. Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment

    5. Minimum intron length: Set minimum intron length
    6. maximum intron length: Set maximum intron length

Test/sample data

The following test data are provided for testing RMTA in here - /iplant/home/shared/iplantcollaborative/example_data/RMTA

  1. Reference Genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa
  2. Reference Annotation: Sorghum_bicolor.Sorbi1.20_chr8.gtf
  3. left_reads- sample_1_R1.fq.gz
  4. right_reads-sample_1_R2.fq.gz
  5.  Stringtie
  6. Fragment Library Type: FR

Leave the rest as default

Results 

Successful execution of RMTA will generate two output folders

  1. Index: This folder consists of the index of the genome
  2. Output: This folder consists of the output from Hisat2, Stringtie and Cuffcompare (Please refer to the manual for the explanation of outputs from these individual programs)
  • No labels