The DE Quick Start tutorial provides an introduction to basic DE functionality and navigation.
Rationale and background:
BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
Felipe A. Simão, Robert M. Waterhouse, Panagiotis Ioannidis, Evgenia V. Kriventseva, & Evgeny M. Zdobnov Zdobnov’s Computational Evolutionary Genomics Group
- A CyVerse account. (Register for an CyVerse account here - user.cyverse.org)
- Mandatory arguments
- Output folder name (name to use for the run and all temporary files (appended))
- Input file (genome assembly/gene set/transcript set file in FASTA format)
- Lineage data (Location of the BUSCO lineage data to use. You can select the BUSCO profile files for your species of interest from the Data window under Community Data -> iplantcollaborative -> example_data -> BUSCO.sample.data )
- Mode of analysis (genome, protein and trans. Default: genome)
- Optional arguments
- species (If your species is not in the list, selecting a closely-related species usually produces better results).
- e-value (Use a custom blast e-value cutoff. Default: 0.03)
- long (Performs full optimization for Augustus gene finding training Default: Off
Test data for this app appears directly in the Discovery Environment in the Data window under Community Data -> iplantcollaborative -> example_data -> BUSCO.sample.data
Execute BUSCO with the following input data
- Output folder - run_example
- Input file - target.fa
- lineage data - example
- mode - genome (default)
- species - fly
- e-value - 0.03 (default)
Successful execution of the BUSCO assessment pipeline will create a directory named run_example along with logs directory. The directory will contain several files and directories:
short_summary_run_sample.txt - Contains a plain text summary of the results in BUSCO notation. Also gives a brief breakdown of the metrics.
- full_table_run_sample.txt - Contains the complete results in a tabular format with scores and lengths of BUSCO matches, and coordinates (for genome mode) or gene/protein IDs (for transcriptome or proteins mode).
- missing_busco_list_run_sample.tsv - Contains a list of missing BUSCOs.
- augustus_output - Augustus-predicted genes, only created during genome assessment. File: augustus.log = full details on Augustus jobs File: training_set_XXXX.txt = genes used for Augustus training Folder: predicted_genes = Augustus raw gene output Folder: extracted_proteins = Augustus protein FASTA output Folder: retraining_parameters = Augustus training results Folder: gb = GenBank format complete BUSCOs Folder: gffs = General Feature Format complete BUSCOs
- blast_output - tBLASTn results, not created for assessment of proteins. File: tblastn_XXXX.txt = tabular tBLASTn results File: coordinates_XXXX.txt = locations of BUSCO matches (genome mode)
- hmmer_output Tabular format HMMER output of searches with BUSCO HMMs
- single_copy_busco_sequences - FASTA format file for each complete single-copy BUSCO identified. .faa files contain protein sequences .fna files contain coding sequences (DNA, genome mode only).
More information on BUSCO-v2 inputs, outputs and parameters can be found in this manual