The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata
Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to kchougul@cshl.edu. Thank you.

Rationale and background:

 

STAR: ultrafast universal RNA-seq aligner

Alexander Dobin,1,* Carrie A. Davis,1 Felix Schlesinger,1 Jorg Drenkow,1 Chris Zaleski,1 Sonali Jha,1 Philippe Batut,1 Mark Chaisson,2 and Thomas R. Gingeras

doi:  10.1093/bioinformatics/bts635

 

Spliced Transcripts Alignment to a Reference (STAR) software is another highly cited splice-ware aligner. It scores above the other aligners in terms of its speed of alignment. Its algorithm uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR can be used in two-pass mapping to improve splice junction accuracy i.e supplying the splice loci found in first pass to into the second mapping pass. STAR also works well will with long reads and has a comparable accuracy with BLAT which is used to mapped long reads. STAR mapping workflow involves two steps i.e generating genome index files and then mapping the reads against the genome.  This app will do the both the index and alignment of reads against the reference genome.


Version: 2.5.3.a


Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Genome reference sequence file name (in fasta format)
    2. Genome reference annotation file name(in gtf format)
    3. FASTQ files ( PE or SE reads) 
    4. File type (paired-PE or single-SE )
  3. Optional arguments
    1. Output bam sorting: SortedByCoordinate (This is sort the bam file by coordinate useful for downstream analysis)
    2. output quantification method: output SAM/BAM alignments to transcriptome- (types of quantification requested

    3. compatibility with Cufflinks and StringTie: set this this to 0 for compatibility-
    4. max number of multiple alignments allowed for a read:20
    5. minimum overhang for unannotated junctions:8
    6. minimum overhang for annotated junctions:1
    7. maximum number of mismatches per pair:999
    8. minimum intron length: 20
    9. maximum intron length: 1000000
    10. maximum genomic distance between mates:1000000
Test/sample data 

The following test data are provided for testing Star-index-align_2.5.3.a in here - /iplant/home/shared/iplantcollaborative/example_data/Star/STAR-2.5.2:

  1. reference genome file - reference.fasta
  2. reference gtf file- reference.gtf
  3. Directory of FASTQ files in (fastq,fq,gz,bz2) -
    1.  reads/sample1.1.fastq.gz reads/sample1.2.fastq.gz

Run Star-index-align_2.5.3.a on FASTQ files using reference files.

Results 

Successful execution of the Star-index-align_2.5.3.a will contain several files and directories:

  • index: STAR genome indices

  • bam_output: all sample only bam files in directory
    • sample1.Aligned.sortedByCoord.out.bam
  • output: individual sample
  • STAR_output: Default output files from STAR which includes

    • Log.out: main log file with a lot of detailed information about the run. This file is most useful for troubleshooting and debugging.

    • sample1.Log.progress.out: reports job progress statistics, such as the number of processed reads, % of mapped reads etc.

    • sample1.Log.final.out: summary mapping statistics after mapping job is complete, very useful for quality control.

    • sample1.Aligned.out.bam - alignments in standard SAM format.

    • sample1.SJ.out.tab- only those reads that contain junctions.

    • sample1.Unmapped.out.mate1- output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s)
    • sample1.Unmapped.out.mate2-output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s)


More information on the tool can be found here - https://github.com/alexdobin/STAR


  • No labels