- From the DE Data window, create a submission folder at File -> Create -> Create NCBI SRA Submission Folder.
Enter information on the number of BioSamples and Libraries.
Name the top-level BioProject folder (click the link for more information on NCBI BioProjects).
Assign each genome to a BioProject. Genomes sequenced as part of the same research effort can belong to a single BioProject.
Enter the total number of BioSamples in your submission (click the link for more information on NCBI BioSamples).
If the same sample is used for two different genome assemblies, use the same BioSample for both.
Enter the largest number of sample-specific sequencing libraries among your BioSamples. For example, if you have two BioSamples and one of them has one library and the other has two, enter ‘2’ for the number of libraries. If you have more Libraries for some BioSamples than others, this will generate some empty Library folders in the next step.You can remove these empty Library folders, or ignore them.
- Raw reads should be submitted to the SRA:
Step 2: Enter metadata at each level of the submission package, save a BioProject metadata file, and validate the metadata file
- Add metadata to every folder in the submission package. BioProject, BioSample, and Library metadata are entered using metadata templates in the DE. After all metadata has been added, save a single metadata file from the BioProject-level folder.
- Enter metadata via the pulldown templates for each folder level (BioProject, BioSample, Library):
- Input: Submission package created in Step 1.
Output: Metadata file saved from the top-level BioProject folder in the submission package.
Three metadata templates will be used to add metadata to the submission package: BioProject, BioSample, and Library, successively:
- For the BioProject folder select "NCBI BioProject Creation WGS" metadata template and fill all the fields.
- For the BioSample folder(s) select "NCBI BioSample - Plant WGS" metadata template and fill all the fields.
- For the Library folder(s) select "NCBI WGS Library" metadata template and fill all the fields.
- When entering a contact email on the BioProject metadata template, enter the email address associated with your NCBI account or you will not receive WGS email notifications on the status of your submission.
- Use the Metadata Term Guide in the DE (located within each template) for explanations of each metadata term.
- To remove a metadata template, click the blue Remove Template button at the top of the template. This removes all metadata from that template.
- If you plan to submit a large number of BioSamples and/or Libraries, see the documentation for adding metadata templates in bulk.
- Alternatively, at the BioSample and Library submission package levels, enter metadata that applies to multiple folders first, then copy it to all folders at that level. Metadata will be copied from the folder selected when the Copy Metadata function is chosen. For more information, see the CyVerse wiki page for metadata copying. If one of the required metadata fields is not shared, you can enter a placeholder so that you can save the template contents for copying, and then edit that field for each folder. After copying, use the ‘Edit metadata’ function to add additional metadata to each folder.
- See http://www.ncbi.nlm.nih.gov/biosample/docs/packages/ for help determining the appropriate BioSample type for your data.
- If you require BioSample templates for variants of MIMS, MIGS, or MIMARKS data, please make the request at email@example.com
- Only submission package folders have metadata. Do not add metadata to the sequence files.
- Any changes to folder names, file names, or metadata require that you save a new metadata file before submission.
- For the top-level, or BioProject, folder in the submission package, select the NCBI BioProject Creation WGS BioProject Metadata template, and enter metadata (metadata template tutorial):
For each BioSample folder in the submission package, select the NCBI BioSample - Plant WGS BioSample Metadata template, and enter metadata (metadata template tutorial):
For each Library folder in the submission package, select the NCBI WGS Library Metadata template and enter metadata (metadata template tutorial). To facilitate metadata entry, enter all shared metadata for a single Library folder and then copy it to all other Library folders. After copying, you can add unique metadata to each Library folder.
Do not add metadata to the sequence files.
After the metadata has been entered, select the top-level BioProject folder in the submission package and use the ‘Save metadata’ function to save a BioProject metadata file for the submission package. This file will serve as input into the WGS submission app in the next step (Step 3):
Once the metadata file has been saved, select the NCBI WGS Submit app to validate the submission package and metadata file. Note: Do not put metadata file in BioProject folder.
- During validation, the app will attempt to create a submission.xml metadata file for use by the WGS system based on the metadata entered into the templates, but will not transfer any files to the WGS. At submission, the app will both create the submission.xml metadata file, and transfer it and sequence files to the WGS.
- If the submission.xml file is created in the DE Analysis output folder and there are no errors, the package has successfully passed validation.
- Input: The BioProject folder (top level of the submission package), BioProject metadata file and checked box of "Validate metadata file only"
Logs folder with information on job execution.
A folder named with your CyVerse username and the top-level BioProject folder ID that contains the submission.xml (metadata file formatted for ingestion by the WGS).
- After successful completion of the run, you should see a file - "report.xml" and within that file, you should see the message "processed-ok" indicating the metadata is correct
- WGS submission requires the creation of a submission template (.sbt).
- Open the "meta2tbl" app in the DE and provide the path to the Bioproject metadata file created in Step 2.
- Input: The Bioproject metadata file created in Step 2 and the "meta2tbl" app in DE.
- Output: The template file named "template.sbt"
- After successful completion of the run, "template.sbt" file is created from the metadata template file.
- Input: The minimum requirements to generate a Sequ.in (sqn) file using tbl2asn are:
- Template (sbt) file (generated in Step 3)
One or more .fsa (fasta) files. Nucleotide sequences in fasta file must conform to the following standards:
There should be no gaps represented, although Ns can be used to represent sequence ambiguities.
There should be no more than 10,000 sequences per file. It is often convenient to group sequences by molecule type (e.g., chromosome) or sequence status (e.g., unplaced or unlocalized).
Typically, files will end with an .fsa extension (e.g., chr1.fsa, chr2.fsa, unknown.fsa) .
- Larger submissions need to be split into multiple files.
- Submit only contigs >199nt.
- Remove any Ns from the beginning or end of each sequence.
Optional files: These correspond to and have the same basenames as the .fsa files:
Annotation files, if appropriate. The .tbl files have a 5-column tab-delimited table of feature locations and qualifiers.
- The .qvl files that provide Phrap/Consed quality scores.
- Output: tbl2asn will generate an .sqn for every .fsa file in the directory, plus any of the corresponding optional files that may be present. The other files must have the same filename prefix as their corresponding .fsa (for example, helicase.fsa and helicase.tbl).
Check the output of the Validation and Discrepancy Report and fix problems:
- Check the errorsummary.val file for the number, severity and type of errors that are present in the .val files. All Errors and Rejects need to be fixed. Contact firstname.lastname@example.org with any questions about the validation output.
- Check the file named 'discrep' for the results of the discrepancy report.
- Categories prefaced with FATAL are always unacceptable and must be fixed.
- Some of the categories are informational.
- Reports that are not flagged as fatal need to be evaluated to determine if they represent annotation artifacts that need to be corrected or if they are acceptable due to the biology of the genome.
- See the discrepancy report examples and explanations for guidance. Write to email@example.com and send the discrep file with questions about this report.
- Make any necessary fixes to the input .fsa and/or .tbl files and run tbl2asn again. You also can make the necessary fixes directly to the .sqn file by opening it in Sequin and editing the features there.
- Input: output.sqn generated from Step 4
- Output: output.sqn.gz
Step 5: Move sequence files to the submission package and save a BioProject metadata file
- Input: Sequence files (sqn) to be submitted to the WGS
- Output: Bioproject metadata file
Caveats and suggestions
In the DE, you can open two windows and then move the sqn files from one window to another window. If the files are big, it slightly takes more time to move them around.
If you already have sqn files, then you can upload files to the DE. See this guide to choose the most appropriate upload method. CyVerse Upload Tutorial - CyberDuck is highly recommended for your uploads.
- After the sqn files have been moved, select the top-level BioProject folder in the submission package and use the ‘Save metadata’ function to save a BioProject metadata file for the submission package. Do not use the same name as in Step 2.
- This file will serve as input into the WGS submission app in the next step (Step 6).
Step 6: Submit the submission package to the WGS
- Run the same app, without selecting the 'Validate metadata file only' option to submit.
Input: The BioProject folder (top-level of the submission package) and the BioProject metadata file (saved from the top-level of the submission package).
Logs folder with information on job execution that includes a ‘manifest.txt’. file with a log of the files transferred to the WGS.
Folder named with your CyVerse username and the top-level BioProject folder ID that contains the submission.xml (metadata file formatted for ingestion by the WGS) and a submit.ready file used to signal WGS systems that submission is complete and to process the submission package.
Caveats and suggestions
- The same app will be run twice: once for validation and once for submission.
- If you made any changes to the submission package contents, or to file/folder names or metadata since last saving the BioProject metadata file, remember to resave the BioProject metadata file before running an app.
- The information buttons in the Apps (to the left of the app name in the Apps list) provide important details.
- The Validation stage is optional but may highly reduce errors detected by the WGS. This is suggested for first-time users.
- For either validation or submission, if the app fails and no submission.xml file is created, there are one or more errors in the submission package. See the Analysis log files (especially condor-stderr-0) for information to assist with error correction.
- Successful validation within the DE does not guarantee that the WGS will not detect additional errors.
- No actual analyses are performed. Metadata will be aggregated into the submission.xml file (Validation and Submission stages) and the package will be transferred to the WGS (Submission Stage).
- To retrieve the submission report, select the “NCBI_Report_Download” app, and as input, select the CyVerse analysis output folder generated in Step 6. It will be named with your CyVerse username. The report will be fetched from the WGS and placed in a new analysis output folder generated by the retrieval app. To resubmit, make necessary changes to the submission package data and metadata, resave a BioProject metadata file from the top-level folder of the submission package, and resubmit with the appropriate WGS submission app.
After successful processing, you should get an email something like this
If you encounter any issues during WGS submission, please send an email to firstname.lastname@example.org.