The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

 

 

 

 

 

 

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


Pre-Requisites:

  1. A CyVerse account (Register for a free CyVerse account at https://user.cyverse.org). 

  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

  3. Mandatory arguments 

    1. Reference genome 

      Note

      Select at least one of the below two options for the indexing of the Reference Genome

      1. Custom genome (required)
      2. Hisat2 Indexed folder (for indexed genomes)


    2. Reference annotation

      1. Custom Reference annotation
      2. Reference Annotation from the list

      Noteinfo

      If you have many files to process through the Discovery Environment, an HT Analysis Path List File may prove useful, as this this app  takes takes only 1 file at a time. For information on how to create an HT path list, click here.


      Note

      Only one of the following three read options (c, d, or e) may be selected per job.

    3. Paired-end reads

      1. FASTQ Files (Read 1): HT path list of read 1 files of paired-end data
      2. FASTQ Files (Read 2): HT path list of read 2 files of paired-end data

        Note

        When inserting multiple paired end FASTQ files, be sure to add Read 2 files in the same order as Read 1 files; the ordering of the SRA ids of both lists (Reads 1 and 2) must be in the same order.

    4. Single-end reads

      1. Single end FASTQ files: HT path list of read files of single-end data


    5. SRA

      1. Enter the SRA id, or
      2. Select a file containing SRA ids: HT path list of multiple SRA ids list files


    6. Aligners

      Note

      Only one of the below two options needs to be selected. Both cannot be selected.

      1. Hisat2 (default)

      2. Bowtie2

        Info
        titleWhen to use Hisat2 or Bowtie2

        Hisat2 is a splice-aware algorithm used to perform reference genome-based read mapping. Stringtie is then used to assemble transcripts based on this read mapping.

        The read aligner Bowtie2 has been included as an optional aligner in the RMTA workflow for users wishing to call single nucleotide polymorphisms (SNPs) from their RNA-seq (or DNA-seq) data in a high throughput manner. When the Bowtie2 option is selected, HiSat2 and Stringtie are both removed from the workflow, but the additional option to remove duplicate reads (important for population level analyses) becomes available.  

    7. Featurecount
      1. Choose a Feature Type. The default option will be "exon"
      2. Choose a gene attributeGene Attribute. The default option will be "gene_id"
      3. Select the

        type

        Type of

        strandedness

        Strandedness. The three options include unstranded, stranded, and reversely stranded.

        Info
        titleFeaturecount Options

        Please refer to your Genome Annotation File (.gtf), and confirm that these settings match your data. For Gene Attribute, be sure that gene_id is written before the name of each gene.

  4. Advanced options
    1. Type of Sequence: Choose either Single End or Paired End
    2. Number of threads (Default is 4)
    3. FPKM cut-off threshold (For RNA-Seq reads only with Hisat2) (Default is 0)
    4. Coverage cut-off threshold (For RNA-Seq reads only with Hisat2) (Default is 0)
    5. Choose RNA strandedness (default is unstranded) 
    6. Trim bases from 5' end of read: Trim bases from 5' (left) end of each read before alignment (Default is 0)
    7. Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment (Default is 0)
    8. Minimum intron length: Set minimum intron length (Default is 20)
    9. Maximum intron length: Set maximum intron length (Default is 500000)
    10. Phred64 (Default is Phred33): Check to run Phred64
    11. Run Fastqc
    12. Remove duplicate reads

      Info
      titlePhred64, Fastqc, and Remove Duplicate reads options

      Phred gives a quality score of how confident nucleotides were identified through automated DNA sequencing. Our program uses Phred33 under default settings. Phred 64 should be used if DNA sequencing was used with Phred 64.

      FASTqc provides the user with both an overview of potential issues with the data, as well as summary graphs highlighting issues such as per base sequence quality and Kmer content. When the FASTqc option has been selected, BAM files are converted back into FASTq, with mapped and unmapped reads, along with their associated quality score retained. This FASTq file is then used as input for FASTqc. If issues are detected at the 5’ or 3’ of sequencing reads, RMTA includes additional options for specifically trimming bases off of either end during the next analysis. Sequencing reads of overall poor quality will simply not be mapped and therefore do not need to be trimmed.

      If the user chose Bowtie as the read aligner and “remove duplicate reads” as an additional option, then the RMTA_Output folder will only contain a sorted BAM file with duplicates removed for each SRA/FASTq input file, as well as a mapped.txt file. No additional files will be generated. 


      Info
      titleWhen using Bowtie2

      When using Bowtie2, be sure to check the box labeled "Remove duplicate reads," as shown in the figure below.



  5. RMTA_Output
    1. Name of the output folder (Default is RMTA_Output)


  6. README
    1. HISAT2 and BOWTIE2 are fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA). StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. Cuffcompare compares your assembled transcripts to a reference annotation and tracks Cufflinks transcripts across multiple experiments (e.g. across a time course).

...