This space is home to learning materials and tutorials created for CyVerse products and services. To search the entire CyVerse wiki, use the box at the upper right.


LEARNING MATERIALS
 

 

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
Show If
groupStaff
upendra_35: Ready for your review–a few questions. (visible only to staff)

Overview

Genome submissions are comprised of genomic DNA sequences representing either incomplete or complete genomes from both prokaryotes and eukaryotes. Incomplete genomes (or incomplete chromosomes of prokaryotes or eukaryotes) are those submissions that have been derived from data created by whole genome shotgun sequencing methods or traditional clone-based sequencing, respectively. WGS projects may be annotated, but annotation is not required. upendra_35: What are complete genomes?Complete genomes are those genomes or prokaryotes or eukaryotes that have chromosomes in single sequence without gaps or Ns that represent gaps

Warning

This workflow enables CyVerse users to make incomplete genomes submissions to the NCBI Whole Genome Shotgun (WGS) only. If you are submitting complete genome submissions to the NCBI (prokaryotic or complete eukaryotic genomes or chromosomes) see the table below for more information.

Complete prokaryotic genomesGenBank Prokaryotic Genomes: Records retrievable from the Nucleotide Database

GenBank archives complete prokaryotic genomes with user submitter-supplied annotations. Alternatively, submitters now can request automated NCBI annotation of sequences as part of the submission process. NCBI has a Prokaryotic Genomes Annotation Pipeline that may be requested whengenome files are submitted to GenBank. This pipeline generates a submission-ready annotated file that the submitter could edit prior to data release. For more information, read about the Prokaryotic Genomes Annotation Pipeline.

Start Here
Complete eukaryotic genomes or chromosomesGenBank Eukaryotic Genomes and Chromosomes: Records retrievable from the Nucleotide Database

GenBank accepts submission of complete eukaryotic chromosomes or complete genomes with submitter-supplied annotations. Complete genomes, with each of the chromosomes in single sequences should be submitted to GenBank as a complete genome via GenomesMacroSend. The most common complete genomes are bacteria, archaea, and fungi. Complete genomes are defined for GenBank as the chromosomes, although any plasmids that are isolated with the chromosomes should be submitted too. As of July 2013, these sequences are allowed to contain gaps and are not required to include annotation. However, submitters need to know what kinds of gaps and linkage evidence are present, as described in Gapped Format for Genome Submissions. For information about annotating genomes, see the prokaryotic annotation guide or eukaryotic annotation guide.

Start Here 

...

  • There still can be gaps within the sequences; you will supply that information (upendra_35: They will supply information the information about the gaps ?)in the submissionduring submission.
  • Plasmids and organelles can still be in multiple pieces. upendra_35: Deleted the 3rd bullet under non-WGS since both have this quality.
  • Internal sequences must be arranged in the correct order and orientation.
  • Sequences concatenated in unknown order are not allowed. upendra_35 How about instead: "Sequences must be concatenated in correct sequence".order

Standard Submission Scenarios

...

  • Only submission package folders have metadata. Do not add metadata to the sequence files

  • Use the Metadata Term Guide in the DE for explanations of each metadata term. The guide is located within each template.

  • Three metadata templates will be used to add metadata to the submission package: BioProject, BioSample, and Library.
    • For the BioProject Folder, select the NCBI BioProject Creation WGS metadata template.
    • For the BioSample Folder(s), select the  NCBI BioSample - Plant WGS metadata template.
    • For the Library Folder(s), select the NCBI WGS Library metadata template.
  • If you plan to add metadata to a large number of BioSamples and/or Libraries, see the documentation for adding metadata templates in bulk.
  • When entering a contact email on the BioProject metadata template, you must enter the email address associated with your NCBI account in order to receive WGS email notifications on the status of your submission.
  • See http://www.ncbi.nlm.nih.gov/biosample/docs/packages/ for help determining the appropriate BioSample type for your data.

  • Use the TEST - NCBI_WGS DE app to validate the metadata file. For validation, the app will attempt to create a submission.xml metadata file for use by the WGS system, based on the metadata entered into the templates.
    upendra_35
    : This app is not in the DE on prod 

Step 3: Create the submission template

Use the saved metadata file in Step 2 to create the submission template (.sbt) using the TEST - meta2tbl app in the DE.
upendra_35
: This app is not in the DE on prod 

Step 4: Convert fasta to sqn file format

...

A WGS submission package contains a BioProject folder with one or more BioSample folders, each of which contain one or more Library folders, and each Library folder contains one or more sequence files. Use the Discovery Environment (DE) Create NCBI SRA Submission Folder tool to create the submission package . @upendra: Where is the Create NCBI SRA Submission Folder in the DE?(see figure below)

Info
  • BioProject, to cluster the data from the same research project. Each genome must belong to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject, either ‘multiisolate’ or ‘multispecies’.
  • BioSample, to provide detailed information about the sample that was sequenced. Biosample info and Biosample packages.
  • Library, to provide details of the WGS library.

...

  1. From the DE Data window, create a submission folder at File -> Create -> Create NCBI SRA Submission Folder.
  2. Enter information on the number of BioSamples and Libraries.

  3. Name the top-level BioProject folder (click the link for more information on NCBI BioProjects).

  4. Each Assign each genome must belong to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject. upendra_35: How about instead "Assign each genome to a BioProject."

  5. Enter the total number of BioSamples in your submission (click the link for more information on NCBI BioSamples).

    • If the same sample is used for two different genome assemblies, use the same BioSample for both.

  6. Enter the largest number of sample-specific sequencing libraries among your BioSamples. For example, if you have two BioSamples and one of them has one library and the other has two, enter ‘2’ for the number of libraries. If you have more Libraries for some BioSamples than others, this will generate some empty Library folders in the next step.You can remove these empty Library folders, or ignore them. 

  7. Raw reads should be submitted to the SRA:

...

  • After the sqn files have been moved, select the the top-level BioProject folder in the submission package and use the ‘Save metadata’ function to save a BioProject metadata file for the submission package. Do not use the same name as in Step 2.
  • This file will serve as input into the WGS submission app in the next step (Step 76). upendra_35: Shouldn't this be step 6?

        

Step 6: Submit the submission package to the WGS     

...

  • Input: CyVerse analysis output folder and submission package in Step 7. upendra_35: Which Step 7? We're on step 7.6.
  • Output: Updated submission package.
  • Caveats and suggestions
    • Remember to save a new metadata file from the top level of the submission package before resubmitting. It is best practice to name this file differently from the previous metadata file.

    • During error correction, only make changes to WGS-detected errors.  All other changes will be ignored by the WGS during resubmission. If additional changes are required, they can be made using the NCBI website after successful submission.
    • If no report.xml is retrieved after running this app, this does not necessarily mean your submission failed. The WGS system may not have generated it yet. Make sure to wait for notification from the WGS that the submission has been received and processed.

  • To retrieve the submission report, select the “TEST - NCBI_Report_Download” app, and as input, select the CyVerse analysis output folder generated in Step 76. It will be named with your CyVerse username. The report will be fetched from the WGS and placed in a new analysis output folder generated by the retrieval app. To resubmit, make necessary changes to the submission package data and metadata, resave a BioProject metadata file from the top-level folder of the submission package, and resubmit with the appropriate WGS submission app.

...