The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Maintenance: Tues, 28 Jan 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.

 

 

 

 

 

Skip to end of metadata
Go to start of metadata
Alert:

 

The iPlant App Store is currently being restructured, and apps are being moved to an HPC environment. During this transition, users may occasionally be unable to locate or use apps that are listed in our tutorials. In many cases, these apps can be located by searching them using the search bar at the top of the Apps window in the DE. To increase the chance for search success, try not searching the entire app name and version number but only the portion that refers to the app's function or origin (e.g. 'SOAPdenovo' instead of 'SOAPdenovo-Trans 1.01'). In critical cases, please report your concern to the iPlant Ask forum or to support@iplantcollaborative.org. Thank you for your patience.

The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.

Please work through the tutorial and add your comments on the bottom of this page. Or send comments per email to upendra@cyverse.org. Thank you.

Rationale and background:

Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, & Evgeny M. Zdobnov Zdobnov’s Computational Evolutionary Genomics Group

Bioinformatics, published online June 9, 2015 (doi: 10.1093/bioinformatics/btv351)

BUSCO (Benchmarking UniversalSingle-Copy Orthologs) is a tool that provides measures for quantitative assessment of genome assembly, gene set, and transcriptome completeness based on evolutionarily informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDBBUSCO assessments are implemented in open-source software, with comprehensive lineage-specific sets of Benchmarking Universal Single-Copy Orthologs for arthropods, vertebrates, metazoans, fungi, eukaryotes, and bacteria. These conserved orthologs are ideal candidates for large-scale phylogenomics studies, and the annotated BUSCO gene models built during genome assessments provide a comprehensive gene predictor training set for use as part of genome annotation pipelines. BUSCO assessments offer intuitive metrics, based on evolutionarily informed expectations of gene content from hundreds of species, to gauge completeness of rapidly accumulating genomic data and satisfy an Iberian's quest for quality - "Busco calidad/qualidade". The software is freely available to download at (http://busco.ezlab.org/). 


Pre-Requisites

  1. A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
  2. Mandatory arguments 
    1. Output folder name (name to use for the run and all temporary files (appended))
    2. Input file (genome assembly/gene set/transcript set file in FASTA format)
    3. Lineage data (Location of the BUSCO lineage data to use. You can select the BUSCO profile files for your species of interest from the Data window under Community Data -> iplantcollaborative -> example_data -> BUSCO.sample.data )
    4. Mode of analysis (genome, protein and trans. Default: genome)
  3. Optional arguments
    1. threads (Number of cpus to run the job. The maximum limit is 4)
    2. species (Chose form the list. If your species is not in the list, selecting a closely-related species usually produces better results).
    3. e-value (Use a custom blast e-value cutoff. Default: 0.001)
    4. region_limit (How many candidate regions (contig or transcript) to consider per BUSCO (Default: 3))
    5. augustus_parameters (Additional parameters for the fine-tuning of Augustus run. For the species, do not use this option.

      Use single quotes as follow: '--param1=1 --param2=2', see Augustus documentation for available options)

    6. Force tblastn (Force tblastn to run on a single core and ignore the threads argument for this step only. Useful if inconsistencies when using multiple threads are noticed. Default: Off)
    7. long (Performs full optimization for Augustus gene finding training Default: Off

Test with sample data

Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> BUSCO.sample.data 

Execute BUSCO with the following input data

  1. Output folder - run_example
  2. Input file - target.fa 
  3. lineage data - example
  4. mode - genome (default)
  5. species - fly
  6. e-value - 0.001 (default)
  7. region_limit - 3 (default) 

Results 

Successful execution of the BUSCO assessment pipeline will create a directory named run_example along with logs directoryThe directory will contain several files and directories:

1- Files

  1. short_summary_run_sample.txt - Contains a plain text summary of the results in BUSCO notation. Also gives a brief breakdown of the metrics.

    # BUSCO version is: 3.0.2
    # The lineage dataset is: sample dataset BUSCO 2.0 (Creation date: 07.10.2016, number of species: 23, number of BUSCOs: 10)
    # To reproduce this run: python /busco/scripts/run_BUSCO.py -i target.fa -o run_sample -l example/ -m genome -c 4 --limit 3 -sp fly
    #
    # Summarized benchmarking in BUSCO notation for file target.fa
    # BUSCO was run in mode: genome

    C:80.0%[S:80.0%,D:0.0%],F:0.0%,M:20.0%,n:10

    8 Complete BUSCOs (C)
    8 Complete and single-copy BUSCOs (S)
    0 Complete and duplicated BUSCOs (D)
    0 Fragmented BUSCOs (F)
    2 Missing BUSCOs (M)
    10 Total BUSCO groups searched

  2.  full_table_run_sample.txt - Contains the complete results in a tabular format with scores and lengths of BUSCO matches, and coordinates (for genome mode) or gene/protein IDs (for transcriptome or proteins mode).
  3. missing_busco_list_run_sample.tsv - Contains a list of missing BUSCOs.

2- Directories

  1. augustus_output - Augustus-predicted genes, only created during genome assessment. File: augustus.log = full details on Augustus jobs File: training_set_XXXX.txt = genes used for Augustus training Folder: predicted_genes = Augustus raw gene output Folder: extracted_proteins = Augustus protein FASTA output Folder: retraining_parameters = Augustus training results Folder: gb = GenBank format complete BUSCOs Folder: gffs = General Feature Format complete BUSCOs
  2. blast_output - tBLASTn results, not created for assessment of proteins. File: tblastn_XXXX.txt = tabular tBLASTn results File: coordinates_XXXX.txt = locations of BUSCO matches (genome mode)
  3. hmmer_output Tabular format HMMER output of searches with BUSCO HMMs
  4. single_copy_busco_sequences - FASTA format file for each complete single-copy BUSCO identified. .faa files contain protein sequences .fna files contain coding sequences (DNA, genome mode only).

More information on BUSCO-v2 inputs, outputs and parameters can be found in this manual

  • No labels