The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata
Alert:

 

The CyVerse App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01').

Also, as part of the 2.8 app categorization, a number of apps were deprecated and are no longer available, and there is no longer an Archive category. You can search for a suitable replacement in the List of Applications in this window, or search on an app name or tool used for an app in the Apps window search field. If you need anapp reinstated, please contact support@cyverse.org.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to support@cyverse.org. Thank you.

Icon
 Though this version of the app works, NCBI recommends the recent version of tbl2asn (gapped)-25.3 app in DE

Rationale and background: 

If your contig sequences include runs of N's that represent gaps, you will need to include assembly_gap features with the appropriate linkage evidence. If the sequences meet certain requirements, then you can generate a gapped submission with tbl2asn using the arguments -l (to add linkage evidence) and -a (to add assembly_gaps), as described below.  Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files using template for submission to GenBank. Additional manual editing is not required before submission.

 Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)

Mandatory arguments

  1. Template file containing a text ASN.1 Submit-block object (suffix .sbt).
  2. Nucleotide sequence data in FASTA format (suffix .fsa). Can be either a single fasta file (containing a single sequence) or single fasta file (containing multiple sequences) 
  3. Linkage Evidence: Type of evidence used to assert linkage across the gaps.  These are the available options (they correspond to the options for column 9 of an AGP file):
  4. Output file name

Optional arguments

  1. Feature Table or Annotation file (suffix .tbl). [Required only if including annotation]
  2. Structured comment file (suffix .cmt)

Gap details

There are two types of gap lengths:

  • Estimated Gap length: The approximate gap size is known.  This is also used if the gap is known to be small  (e. g. gap could be between 10-50 N's).
  • Unknown Gap length:  The gap size is not known (e.g. gap could be 50 or 50000 N's) but the order and orientation of the contigs are known.  We suggest using 100 N's to represent gaps of unknown length rather than a  random number because it will allow you to add assembly_gap features using tbl2asn.

Parameters

  1. Master Genome Flags 
  2. Discrepancy Report: Recommended only for annotated genome submissions, complete or WGS. See the Discrepancy Report page for information about its output.
  3. Modifiers for FASTA Definition Lines: Allows the addition of source qualifiers that will be the same for each submission

Test/sample data:


The test data are provided for testing tbl2asn (ungapped)-22.9 in here - /iplant/home/shared/iplantcollaborative/example_data/tbl2asn.sample.data:

Use the following inputs/outputs and parameters for testing tbl2asn (gapped)-22.9

1. All the gaps are of estimated lengths: Every run of 5 or more Ns represents a gap of estimated length, and the linkage evidence is paired-ends:

Icon

Note that you should only include an assembly_gap for runs of N's that represent gaps.  Do not add assembly_gaps for single or short runs of N's that represent ambiguous bases. You will need to check your assembly parameters to determine what the N's represent.

  1. Mandatory argument

    1. Template file - template_BP_BS.sbt

    2. Fasta file - sample.gapped.unknown.fsa
    3. Linkage evidence - paired-ends (ie, for paired ends or mate pairs)
    4. Output file - out.gapped.sqn
  2. Optional arguments 
    1. Annotation file - multiple.tbl
    2. Structured comment file - assembly.cmt
  3. Gap details
    1. Estimated Gap length - r5k (Runs of 5 or more N's are estimated gaps and shorter runs of N's are ambiguous bases). 
  4. Parameters
    1. Organism name - [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
    2. Master Genome Flag - n (default)
    3. Run Discrepency report - checked  (Recommended) 

2. ALL of the gaps are 100bp and are of unknown length: All gaps are 100 Ns and are of unknown length, and the linkage evidence is by alignment to another genome of the same genus:

Icon

Note that all of the unknown length gaps must be 100 N's. An assembly_gap will be added for every run of 100 N's.  All other N's will be ignored.  Please contact us for additional instructions if there are unknown length gaps of other sizes. Note that you must know the order and orientation of the contigs.  You cannot randomly link contigs using unknown (or known) length gaps.  If you do not have linkage evidence, submit the sequences as individual contigs.

  1. Mandatory argument

    1. Template file - template_BP_BS.sbt

    2. Fasta file - sample.gapped.known.fsa
    3. Linkage evidence - align-genus
    4. Output file - out.gapped.sqn
  2. Optional arguments 
    1. Annotation file - multiple.tbl
    2. Structured comment file - assembly.cmt
  3. Gap details
    1. Estimated Gap length - r100u (Runs of 5 or more N's are estimated gaps and shorter runs of N's are ambiguous bases). 
  4. Parameters
    1. Organism name - [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
    2. Master Genome Flag - n (default)
    3. Run Discrepency report - checked  (Recommended) 

3. There are both estimated length and unknown length gaps: Runs of 10 or more N's are estimated gaps, and shorter runs of N's are just ambiguous bases, and all runs of exactly 100 N's are unknown gaps, and the linkage evidence is paired-ends

Icon

Note that all of the unknown length gaps must be 100 N's.  The # indicates the size of the minimum number of N's to convert to an estimated length gap. If some run's of 100 N's are unknown length and others are estimated length, please contact us for more information.

  1. Mandatory argument

    1. Template file - template_BP_BS.sbt

    2. Fasta file - sample.gapped.unknown.fsa
    3. Linkage evidence - paired-ends (ie, for paired ends or mate pairs)
    4. Output file - out.gapped.sqn
  2. Optional arguments 
    1. Annotation file - multiple.tbl
    2. Structured comment file - assembly.cmt
  3. Gap details
    1. Estimated Gap length - r10u  
  4. Parameters
    1. Organism name - [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
    2. Master Genome Flag - n (default)
    3. Run Discrepency report - checked  (Recommended) 

Output Reports:

  1. out.gapped.sqn - sqn file for submission to WGS
  2. multiple.val - varification report
  3. discrep - discrepency report
  4. errorsummary.val - Summary file showing the number, severity and type of errors found in all the .val files.

More information about tbl2asn (gapped)-22.9 can be found at http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/ and http://www.ncbi.nlm.nih.gov/genbank/wgs_gapped/

  • No labels