The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Skip to end of metadata
Go to start of metadata

Please work through the documentation and add your comments to the bottom of this page, or email comments to support@cyverse.org. Thank you.

 

Rationale and background:

HISAT2: Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

Mihaela Pertea,Daehwan Kim, Geo M Pertea, Jeffrey T Leek, Steven L Salzberg

Nature Protocols 11,1650–1667(2016)doi:10.1038/nprot.2016.095

 

HISAT, StringTie and Ballgown provide a complete analysis package (the 'new Tuxedo' package. RNA-seq analysis begins by mapping reads against a reference genome to identify their genomic positions. This mapping information allows us to collect subsets of the reads corresponding to each gene, and then to assemble and quantify transcripts represented by those reads. Hisat2 is another efficient splice aligner which is a replacement for Tophat in the new Tuxedo protocol. Like Tophat2 it uses one global FM index along with several small local FM indexes to build an efficient data structure which helps speed its alignment several times faster than Tophat2. If reference annotation is provided Hisat2 can extract with in built python script extract_splice_sites.py & extract_exons.py the splice site and exon information respectively. The wrapper script then takes the built index and and does alignment of reads against the reference.
HISAT2 software (http://ccb.jhu.edu/software/hisat2 or http://github.com/infphilo/hisat2, version 2.0.1 or later)

 

Pre-Requisites

A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
Mandatory arguments
Output folder name
Input file reference genome sequence in fasta format
FASTQ Files (Read 1) : Input reads 1 files of paired end data or reads of single end data
FASTQ Files (Read 2) : Input reads 2 files of paired end data or leave this field empty for single end data
Fragment Library Type: specify the format of the library- more details(http://sailfish.readthedocs.io/en/master/library_type.html)
File type: Enter whether the library is paired end or single end
Optional arguments:
Trim bases from 5' end of read:Trim bases from 5' (left) end of each read before alignment
Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment
Phred quality score: encoding for quality score

Minimum intron length:Sets minimum intron length
maximum intron length:Sets maximum intron length
Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
Report alignments tailored for transcript assemblers including StringTie:With this option, HISAT2 requires longer anchor lengths for de novo discovery of splice sites. This leads to fewer alignments with short-anchors, which helps transcript assemblers improve significantly in computational and memory usage.
minimum fragment length for valid paired-end alignments:The minimum fragment length for valid paired-end alignments.
maximum fragment length for valid paired-end alignments:

Test/sample data
The following test data are provided for testing HISAT2 in here - /iplant/home/shared/iplantcollaborative/example_data/tophat2-PE( We will use a similar data as used for tophat2-PE):

left_reads- SRR946914_fastq_1.fastq,SRR946916_fastq_1.fastq
right_reads-SRR946914_fastq_2.fastq, SRR946916_fastq_2.fastq
reference-NC_010473.fa
Results
Successful execution of the HISAT2-index-align assessment pipeline will create a directory named out. The directory will contain bam and bai files for each sample. This can be used for further downstream analysis and visualization purpose:

 

output

SRR946914_fastq_1.sorted.bam

SRR946914_fastq_1.sorted.bam.bai

SRR946916_fastq_1.sorted.bam

SRR946916_fastq_1.sorted.bam.bai