This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Pre-Requisites

  1. A CyVerse account. (Register for an a CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Output folder name
    2. Input file reference genome sequence in fasta format
    3. Hisat2 reference genome: Select at least one of the below three options for the indexing of the Reference Genome
      1. Custom Reference genome
      2. Select reference genome from the list
      3. Hisat2 Indexed folder
    4. Hisat2 reference annotation: Select at least one of the below two options for using as annotation
      1. Custom Reference annotation
      2. Select reference annotation from the list

      Use one of the following three:
    5. Paired-end reads
      1.  FASTQ Files (Read 1): Input reads 1
      files
      1. file of paired
      end data or reads of single end data
      1. -end data 
      2. FASTQ Files (Read 2): Input reads 2 files of paired-end data
      or leave this field empty for
    6. Single-end reads
      1. single end
      data
    7. Fragment Library Type: specify the format of the library- more details(http://sailfish.readthedocs.io/en/master/library_type.html)
    8.  File type: Enter whether the library is paired end or single end
    Optional arguments:
      1. FASTQ files
    1. SRA
      1. SRA ID: Single SRA id that you want to use
      2. File containing SRA id's: Multiple SRA's that you want to use
    2. Cufflinks/Stringtie:  Only one of the below two options needs to be checked. Cannot select both
      1. StringTie
      2. Cufflinks
      3. Coverage cut-off threshold: Select from 0-5
      4. FPKM cut-off threshold: FPKM cut-off you want to use to filter the transcripts
    3. Cuffmerge: Run Cuffmerge for Stringtie/Cufflinks gtfs (Only works with more than one sample files)
  3. Advanced options
    1. Phred quality score: encoding for quality score: Phread64 (Default is Phred 33)
    2. Fragment Library Type: specify the format of the library either FR, RF, F, R etc.
    3. Trim bases from 5' end of read: Trim bases from 5' (left) end of each read before alignment
    4. Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment

    5. Phred quality score: encoding for quality score

    6. Minimum intron length: Sets Set minimum intron length
    7. maximum intron length: Sets Set maximum intron length
    8. Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
    9. Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
    10. minimum fragment length for valid paired-end alignments:The minimum fragment length for valid paired-end alignments.
    11. maximum fragment length for valid paired-end alignments:

Test/sample data

The following test data are provided for testing HISAT2 RMTA in here - /iplant/home/shared/iplantcollaborative/example_data/tophat2-PE( We will use a similar data as used for tophat2-PE):RMTA

  1. Reference Genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa
  2. Reference Annotation: Sorghum_bicolor.Sorbi1.20_chr8.gtf
  3. left_reads- SRR946914 sample_fastq_1.fastq,SRR946916_fastq_1.fastq_R1.fq.gz
  4. right_reads-SRR946914sample_fastq1_2.fastq, SRR946916_fastq_2.fastq
  5. reference-NC_010473.fa
  6. R2.fq.gz
  7.  Stringtie
  8. Fragment Library Type: FR

Leave the rest as default

Results 

Successful execution of the  HISAT2-index-align assessment pipeline will create a directory named out. The directory will contain bam and bai files for each sample. This can be used for further downstream analysis and visualization purpose:

 

output

SRR946914_fastq_1.sorted.bam

 SRR946914_fastq_1.sorted.bam.bai

 SRR946916_fastq_1.sorted.bam

SRR946916_fastq_1.sorted.bam.bai

 

RMTA will generate two output folders

  1. Index: This folder consists of the index of the genome
  2. Output: This folder consists of the output from Hisat2, Stringtie and Cuffcompare (Please refer to the manual for the explanation of outputs from these individual programs)