The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the documentation and add your comments on the bottom of this page, or email comments to support@cyverse.org.

Rationale and background:

RMTA is a wrapper script built on top of several publicly available bioinformatic tools that can rapidly proceed from raw short read data to assembled transcripts. RMTA performs this by mapping reads using either HiSat2 or Bowtie2 and then assembling transcripts using either Cufflinks or StringTie according to user preference. RMTA can process FASTQ files containing paired-end or single-end reads or can directly process one or more sequence read archives (SRA) from NCBI using SRA IDs. RMTA has been successfully used by many groups as the first step towards the identification of long non-coding RNAs using the Evolinc workflow. More information about RMTA can be found here

RMTA (read mapping, transcript assembly), is a gene quantification workflow for RNA-Seq data utilizing CyVerse’s Discovery Environment HT-Condor for job submission and Datastore for data management.

RMTA minimally requires the following input data:

  1. Reference Genome (FASTA) or Hisat2 Indexed Reference Genome (in a subdirectory)
  2. Reference Transcriptome (GFF3/GTF/GFF)
  3. RNA-Seq reads (FASTQ) - Single end or Paired-end (compressed or uncompressed) or multiple NCBI SRA id's (each SRA ID on a separate row in the text file).


Pre-Requisites:

  1. A CyVerse account (Register for a CyVerse account at https://user.cyverse.org). 

  2. An up-to-date Java-enabled web browser. (Firefox recommended. If you wish to work with your own large datasets and upload them using iCommands, Chrome is not suitable due to its issues in utilizing 64-bit Java.)

  3. Mandatory arguments 

    1. Reference genome 

      Icon

      Select at least one of the below two options for the indexing of the Reference Genome

      1. Custom genome
      2. Reference genome from the list
      3. Hisat2 Indexed folder (Preferred type)
    2. Reference annotation

      1. Custom Reference annotation
      2. Reference Annotation from the list

                

      Icon

      Only one of the following three read options has to be selected.

    3. Paired-end reads

      1. FASTQ Files (Read 1): HT path list of read 1 files of paired-end data
      2. FASTQ Files (Read 2): HT path list of read 2 files of paired-end data
    4. Single-end reads

      1. Single end FASTQ files: HT path list of read files of single-end data
    5. SRA

      1. File containing SRA id's: HT path list of multiple SRA ids list files
    6. Cufflinks/Stringtie:  

      Icon

      Only one of the below two options needs to be checked. Cannot select both

      1. StringTie
      2. Cufflinks
      3. Coverage cut-off threshold: Select from 0-5 (Default is 2)
      4. FPKM cut-off threshold: FPKM cut-off you want to use to filter the transcripts (Default is 2)
  4. Advanced options
    1. Phred quality score: encoding for quality score: Phread64 (Default is Phred 33)
    2. Fragment Library Type: specify the format of the library either FR, RF, F, R etc.
    3. Trim bases from 5' end of read: Trim bases from 5' (left) end of each read before alignment (Default is 0)
    4. Trim bases from 3' end of read: Trim bases from 3' (right) end of each read before alignment (Default is 0)

    5. Minimum intron length: Set minimum intron length (Default is 20)
    6. maximum intron length: Set maximum intron length (Default is 500000)
  5. Output
    1. Name of the output folder (Default is Output)

Test/sample data

The following test data are provided for testing RMTA in here - /iplant/home/shared/iplantcollaborative/example_data/OSG-RMTA

  1. Reference Genome: Sorghum_bicolor.Sorbi1.20.dna.toplevel_chr8.fa
  2. Reference Annotation: Sorghum_bicolor.Sorbi1.20_chr8.gtf
  3. left_reads- sample_1_R1.fq.gz
  4. right_reads-sample_1_R2.fq.gz
  5.  Stringtie
  6. Fragment Library Type: FR

Leave the rest as default

Results 

Successful execution of RMTA will generate two output folders

  1. Index: This folder consists of the index of the genome
  2. Output: This folder consists of the output from Hisat2, Stringtie and Cuffcompare (Please refer to the manual for the explanation of outputs from these individual programs)
  • No labels