Overview

Genome submissions are comprised of genomic DNA sequences representing either incomplete or complete genomes from both prokaryotes and eukaryotes. Incomplete genomes (or incomplete chromosomes of prokaryotes or eukaryotes) are those submissions that have been derived from data created by whole genome shotgun sequencing methods or traditional clone-based sequencing, respectively. WGS projects may be annotated, but annotation is not required. Complete genomes are those genomes or prokaryotes or eukaryotes that have chromosomes in single sequence without gaps or Ns that represent gaps

This workflow enables CyVerse users to make incomplete genomes submissions to the NCBI Whole Genome Shotgun (WGS) only. If you are submitting complete genome submissions to the NCBI (prokaryotic or complete eukaryotic genomes or chromosomes) see the table below for more information.

Complete prokaryotic genomesGenBank Prokaryotic Genomes: Records retrievable from the Nucleotide Database

GenBank archives complete prokaryotic genomes with user submitter-supplied annotations. Alternatively, submitters now can request automated NCBI annotation of sequences as part of the submission process. NCBI has a Prokaryotic Genomes Annotation Pipeline that may be requested whengenome files are submitted to GenBank. This pipeline generates a submission-ready annotated file that the submitter could edit prior to data release. For more information, read about the Prokaryotic Genomes Annotation Pipeline.

Start Here
Complete eukaryotic genomes or chromosomesGenBank Eukaryotic Genomes and Chromosomes: Records retrievable from the Nucleotide Database

GenBank accepts submission of complete eukaryotic chromosomes or complete genomes with submitter-supplied annotations. Complete genomes, with each of the chromosomes in single sequences should be submitted to GenBank as a complete genome. The most common complete genomes are bacteria, archaea, and fungi. Complete genomes are defined for GenBank as the chromosomes, although any plasmids that are isolated with the chromosomes should be submitted too. As of July 2013, these sequences are allowed to contain gaps and are not required to include annotation. However, submitters need to know what kinds of gaps and linkage evidence are present, as described in Gapped Format for Genome Submissions. For information about annotating genomes, see the prokaryotic annotation guide or eukaryotic annotation guide.

Start Here 

If you are unsure about the type of data submitted to the WGS division, visit the WGS List for example projects.

Types of WGS submissions

Both WGS and non-WGS genomes can be submitted via this workflow. You will be asked to choose whether or not the genome being submitted is considered WGS. The differences for GenBank purposes are:

Non-WGS

WGS

For both non-WGS and WGS

Standard Submission Scenarios

There are two main formats for WGS submissions:

Prerequisites

  1. Carefully read this tutorial.
  2. Review the example Input data, Output data, and metadata for this tutorial in the Discovery Environment Data window in Community Data -> iplantcollaborative -> example_data -> WGS_submission.
  3. You must have an NCBI account to submit. You can obtain an NCBI account here.
  4. You must have used your NCBI account credentials to log into the WGS submitter system at least once to submit from CyVerse. To ensure that you have logged in to the WGS submitter system, go to the WGS homepage.
  5. Be aware that submission is not complete until you receive final notification from the WGS that your data have been received, processed, and will be released on the specified date.
  6. For help interpreting submission errors in WGS notification emails, email the WGS help desk at genomes@ncbi.nlm.nih.gov.

  7. For help with issues within the CyVerse Discovery Environment, or to provide feedback, email support@cyverse.org.

Basic workflow

WGS submission steps (for advanced users)

An example of submission package metadata is in the Discovery Environment Data window under Community Data -> iplantcollaborative -> example_data -> WGS_submission

Step 1: Create an NCBI WGS submission folder in the DE

The submission package is created using tools in the DE. Submission package has three levels: BioProject, BioSample, and Library. Package organization is similar to the SRA organization detailed in the NCBI Quick Start Guide.  

Step 2: Add metadata to every folder in the submission package, save the metadata to a file, and validate the metadata file

BioProject, BioSample, and Library metadata are entered using metadata templates in the DE. 

Step 3: Create the submission template

Use the saved metadata file in Step 2 to create the submission template (.sbt) using the TEST - meta2tbl app in the DE. 

Step 4: Convert fasta to sqn file format

Run tbl2asn-gapped-25.3 or tbl2asn-ungapped-25.3 along with the submission template generated in Step 3 for converting fasta files to sqn format, depending on the type of your WGS submission. Check the output of the Validation and Discrepancy Report, and fix any  problems,

Step 5: Move the sqn files (sqn) into the library subfolder

The sqn file generated in Step 4 needs to be moved into libraries folder under Bioproject -> Biosample, and save the Bioproject metadata to a file.

Step 6: Submit to WGS

Run the TEST - NCBI_WGS_Submit app to submit to the WGS. Make sure you uncheck the Validate metdata file only checkbox. The app will both create the submission.xml metadata file and transfer all sequence files to the WGS.

Step 7: Download the report

If error correction and resubmission are needed, the WGS-generated report can be retrieved with the TEST - NCBI_Report_Download app. Corrections to the submission package can be made within the DE, and resubmission follows the same process as above.  

Detailed WGS submission Steps (for beginners)

An example of submission package metadata is in the Discovery Environment Data window under Community Data -> iplantcollaborative -> example_data -> WGS_submission.

Step 1: Create and organize the submission package

The submission package is created using tools in the DE. Submission packages have three levels: BioProject, BioSample, and Library. Package organization is similar to the SRA organization detailed in the NCBI Quick Start Guide.  

Until the next DE release, the submission package is the same for both SRA and WGS.

A WGS submission package contains a BioProject folder with one or more BioSample folders, each of which contain one or more Library folders, and each Library folder contains one or more sequence files. Use the Discovery Environment (DE) Create NCBI SRA Submission Folder tool to create the submission package (see figure below)

  • BioProject, to cluster the data from the same research project. Each genome must belong to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject, either ‘multiisolate’ or ‘multispecies’.
  • BioSample, to provide detailed information about the sample that was sequenced. Biosample info and Biosample packages.
  • Library, to provide details of the WGS library.
  1. From the DE Data window, create a submission folder at File -> Create -> Create NCBI SRA Submission Folder.
  2. Enter information on the number of BioSamples and Libraries.

  3. Name the top-level BioProject folder (click the link for more information on NCBI BioProjects).

  4. Assign each genome to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject.

  5. Enter the total number of BioSamples in your submission (click the link for more information on NCBI BioSamples).

  6. Enter the largest number of sample-specific sequencing libraries among your BioSamples. For example, if you have two BioSamples and one of them has one library and the other has two, enter ‘2’ for the number of libraries. If you have more Libraries for some BioSamples than others, this will generate some empty Library folders in the next step.You can remove these empty Library folders, or ignore them. 

  7. Raw reads should be submitted to the SRA:

Step 2: Enter metadata at each level of the submission package, save a BioProject metadata file, and validate the metadata file

  1. Add metadata to every folder in the submission package. BioProject, BioSample, and Library metadata are entered using metadata templates in the DE. After all metadata has been added, save a single metadata file from the BioProject-level folder.
  2. Enter metadata via the pulldown templates for each folder level (BioProject, BioSample, Library):
    1. Input: Submission package created in Step 1.
    2. Output: Metadata file saved from the top-level BioProject folder in the submission package.

  3. Three metadata templates will be used to add metadata to the submission package: BioProject, BioSample, and Library, successively:

    1. For the BioProject folder select  "NCBI BioProject Creation WGS" metadata template and fill all the fields.
    2. For the BioSample folder(s) select "NCBI BioSample - Plant WGS" metadata template and fill all the fields.
    3. For the Library folder(s) select "NCBI WGS Library" metadata template and fill all the fields.
  4. When entering a contact email on the BioProject metadata template, enter the email address associated with your NCBI account or you will not receive WGS email notifications on the status of your submission.
  5. To remove a metadata template, click the blue Remove Template button at the top of the template. This removes all metadata from that template.
  6. If you plan to submit a large number of BioSamples and/or Libraries, see the documentation for adding metadata templates in bulk.
  7. Alternatively, at the BioSample and Library submission package levels, enter metadata that applies to multiple folders first, then copy it to all folders at that level. Metadata will be copied from the folder selected when the Copy Metadata function is chosen. For more information, see the CyVerse wiki page for metadata copying. If one of the required metadata fields is not shared, you can enter a placeholder so that you can save the template contents for copying, and then edit that field for each folder. After copying, use the ‘Edit metadata’ function to add additional metadata to each folder.  
  8. See http://www.ncbi.nlm.nih.gov/biosample/docs/packages/ for help determining the appropriate BioSample type for your data.
  9. If you require BioSample templates for variants of MIMS, MIGS, or MIMARKS data, please make the request at support@cyverse.org   
  10. Only submission package folders have metadata. Do not add metadata to the sequence files.
  11. Any changes to folder names, file names, or metadata require that you save a new metadata file before submission.
  12. For the top-level, or BioProject, folder in the submission package, select the NCBI BioProject Creation WGS BioProject Metadata template, and enter metadata (metadata template tutorial):

       

     

Step 3: Create the submission template

Step 4: Convert fastq to sqn 

Tbl2asn is a command-line program that automates the creation of sequence records for submission to GenBank. It uses many of the same functions as Sequin but is driven generally by data files. Tbl2asn generates .sqn files using the template generated from Step 3 for WGS submission. Depending on whether or not your genome is gapped or ungapped, you can chose between the tbl2asn-gapped-25.3 or tbl2asn-ungapped-25.3 DE apps.

Step 5: Move sequence files to the submission package and save a BioProject metadata file

        

        

Step 6: Submit the submission package to the WGS     

After you submitted, the submission package will be validated by the WGS system and email notifications will be sent by the WGS to the contact email added in the BioProject metadata to confirm successful submission, or to communicate submission errors.

What happens at WGS? CyVerse systems connect to WGS systems and create the submission folder on the WGS side.  Files are transferred and a submit.ready file is sent to the WGS to signal that the submission package is complete and they can begin processing. The WGS system validates the submission package and generates a report.xml file containing any errors detected. The WGS system sends notification email(s) to the contact email  provided in the BioProject metadata template, and to the CyVerse team to notify of either a successful or failed submission.  The first email will be titled "Submission ownership transfer".  Follow the instructions in that email to transfer ownership of the submission to the NCBI user included in the package metadata.  After ownership transfer, you can view the submission progress at  https://submit.ncbi.nlm.nih.gov/subs/.  You may need to log in with the NCBI credentials for the account you used in the submission metadata.   After you receive further notification from the WGS, i f there are errors, you can retrieve the submission report.xml file from WGS servers with the "TEST - NCBI_Report_Download" App in the DE, make corrections, and resubmit (see below).

Step 7: Download the report  

If error correction and resubmission are needed, the WGS-generated error report can be retrieved with the "TEST - NCBI_Report_Download" App. Use this report to correct the errors and resubmit.  Corrections to the submission package can be made within the DE by updating the submission package organization or metadata, and resubmitting beginning with Step 4.

      

      

  1. 1.     QueuedPicked up by target database, automated validations can be rum, but no curator is assigned yet.

  1. 2.     ProcessingTransformation, curation and loading.
  1. 3.     Processed-ok: Processing completed successfully, objects are accessioned and loaded in archive. No further resubmissions for this action will be processed. Accessions are not necessarly public yet.
  1. 4.     Processed-errorProcessing completed with error(s). Some objects can be accessioned and loaded, while some may be waiting for corrections from user.
  1. 5.     DeletedAction is deleted and no work on it is expected. This could be due to a duplicate, error, etc.

 

If you encounter any issues during WGS submission, please send an email to support@cyverse.org.