Overview

This workflow enables CyVerse users to make submissions to the NCBI Sequence Read Archive (SRA).  Submissions instructions include compressed sequenced files (FASTQ.gz, SFF.gz, and BAM.gz) and an XML metadata file, organized into a submission package.  If you need to submit an alternative file format (HD5, SOLiD, and SRF) please submit a question to the CyVerse Ask forum shown below.

How to get help

Before You Start

Before You Start: Carefully read this tutorial.

Before You Start: Review the example Input and output data and metadata for this tutorial in the Discovery Environment Data window in  Community Data -> iplantcollaborative -> example_data -> SRA_submission.

Before You Start: You must have an NCBI account to submit. You can obtain an NCBI account here.

Before You Start: You must have used your NCBI account credentials to log into the SRA submitter system at least once to submit from CyVerse. To ensure that you have logged into the SRA submitter system:  go to the the  SRA homepage , click the tab at the top of the page labeled 'Submit', click the link 'NCBI PDA - NCBI Primary Data Submitters', authenticate if needed.

Before You Start: Be aware that submission is not complete until you receive final notification from the SRA that your data have been received, processed, and will be released on the specified date.

General Submission Steps and Important Information


Step 1 -  Upload compressed sequence files into the CyVerse Discovery Environment (DE).

Step 2 -  Create submission package folders and add compressed sequence files.  The submission package is created using tools in the DE.  Submission Packages have three levels: BioProject, BioSample, Library.  Package organization is similar to the SRA organization detailed in the NCBI Quick Start Guide.  Within the DE data and metadata for SRA-defined ‘Experiments’ and ‘Runs’ are part of the ‘Library level of the submission package’. 

Step 3 -  Add metadata to every folder in the submission package.  BioProject, BioSample, and Library metadata are entered using metadata templates in the DE.  After all metadata has been added, save a single metadata file from the BioProject-level folder.

Step 4 - In a 2-stage process, select the appropriate SRA Submission App to first validate the submission package and then, after successful validation, to submit to the SRA.  For validation, the App will attempt to create a submission.xml metadata file for use by the SRA system based on the metadata entered into the templates, but will not transfer any files to the SRA.  For submission, the App will both create the submission.xml metadata file and transfer it and all compressed sequence files to the SRA.

Step 5 -  The submission package will be validated by the SRA system and email notifications will be sent by the SRA to the contact email added in the BioProject metadata to confirm successful submission, or to communicate submission errors.

Step 6 -  If error correction and resubmission are needed, the SRA-generated error report can be retrieved with the ' NCBI SRA Submission Report Retrieval' App.  Corrections to the submission package can be made within the DE, and resubmission follows the same process.  

Video Overviews

*Once a video is playing, you can use the YouTube Settings Gear button to adjust video resolution.

 

 

 

 

 

Detailed Submission Steps

Step 1) Upload Compressed Sequence Files to the Discovery Environment 

If your compressed sequence files are already present in the Discovery Environment (DE), proceed to step 2. For instructions on managing data / metadata and running analyses in the DE, see the DE manualIf you are unsure how to upload or compress your files, see the “Caveats and Suggestions" sections in this step.

Figure 1  (click to expand) - gzip Example

gzip.png

Figure 2 (click to expand) - DE  Compression App.  Left Pane  shows a Data window  used to create an HT Analysis Path List file to submit a list of files to an App.  Right Pane shows the ‘Compress files with gzip’ App with an HT Analysis Path List file as input.  This App will also accept single files as input.

Step 2) Create and organize submission package

Create submission package folders and add compressed sequence files.  The submission package is created using tools in the DE.  Submission Packages have three levels: BioProject, BioSample, Library.  Package organization is similar to the SRA organization detailed in the  NCBI Quick Start Guide.  Within the DE, data and metadata for SRA-defined ‘Experiments’ and ‘Runs’ are part of the ‘Library level of the submission package’. 

 An example of a submission package is in the Discovery Environment Data window under Community Data -> iplantcollaborative -> example_data -> SRA_submission -> 0_Submission_Input -> BioProject_Create_Example.

An SRA submission package contains a BioProject folder with 1 or more BioSample folders, each of which contain 1 or more Library folders.  Each Library folder contains 1 or more compressed sequence files. Use the Discovery Environment (DE) ‘Create NCBI SRA Submission Folder’ tool to create the submission package, then add your compressed files to the ‘Library’ folders.

Figure 3 (click to expand) -   Create Submission Package

01_Folder_Creation.png

Figure 4 (click to expand) -  Entering Submission Package Information

02_Folder_Creation.png


Figure 5 (click to expand)  Example Submission Package  

03_Folder_Creation.png

Step 3) Enter metadata at each level of the submission package and save a BioProject metadata file

Add metadata to every folder in the submission package. BioProject, BioSample, and Library metadata are entered using metadata templates in the DE. After all metadata has been added, save a single metadata file from the BioProject-level folder.

An example of submission package metadata is in the Discovery Environment Data window under Community Data -> iplantcollaborative -> example_data -> SRA_submission -> 0_Submission_Input -> BioProject_Create_Example, where you can view the metadata for each folder.

Enter metadata via pulldown templates for each folder level (BioProject, BioSample, Library)

 

Figure 6 (click to expand) - Open Metadata Templates: From the menu, select Edit -> Edit Metadata.

BioProject_Metadata2.png

Figure 7  (click to expand) - Select Appropriate BioProject Metadata Template: NCBI BioProject Creation or NCBI BioProject Update.

BioProject_Metadata.png

Figure 8 (click to expand) -  Metadata Copying

Copy_Metadata.png


Figure 9  (click to expand) -  Save a BioProject Metadata File From the Top-Level BioProject Submission Package Folder

Save_Metadata.png

Step 4) Submit package to the SRA

In a 2-stage process, select the appropriate SRA Submission App to first validate the submission package and then, after successful validation, to submit to the SRA.  For validation, the App will attempt to create a submission.xml metadata file for use by the SRA system based on the metadata entered into the templates, but will not transfer any files to the SRA.  At submission, the App will both create the submission.xml metadata file and transfer it and all compressed sequence files to the SRA.

 An example of submission output is in the Discovery Environment Data window under Community Data -> iplantcollaborative -> example_data -> SRA_submission -> 1_Submission_Output -> BioProject_Create_Example.

Step 4a - Validation  - Select  either the ‘ NCBI SRA Submission - BioProject Creation ’ or ‘  NCBI SRA Submission - BioProject Update ’ app. Run the app with the 'Validate metadata file only' option (Tutorial for running Apps in the DE).  If the submission.xml file is created in the DE Analysis output folder, the package has successfully passed validation – move on to submission. If validation fails check the log files to find out why.

Step 4b Submission  - After successful validation, run the same app used in the Validation stage, without selecting the 'Validate metadata file only' option to submit.  

Step 5) Receive submission notification from SRA (to email you provided as contact email address using BioProject metadata template)

The submission package will be validated by the SRA system and email notifications will be sent by the SRA to the contact email added in the BioProject metadata to confirm successful submission, or to communicate submission errors.

 What happens at SRA?  CyVerse systems connect to SRA systems and create the submission folder on the SRA side.  Files are transferred and a submit.ready file is sent to the SRA to signal that the submission package is complete and they can begin processing.  The SRA system validates the submission package and generates a report.xml file containing any errors detected.  The SRA system sends notification email(s) to the contact email  provided in the BioProject metadata template, and to the CyVerse team to notify of either a successful or failed submission.  The first email will be titled "Submission ownership transfer".  Follow the instructions in that email to transfer ownership of the submission to the NCBI user included in the package metadata.  After ownership transfer, you can view the submission progress at https://submit.ncbi.nlm.nih.gov/subs/.  You may need to log in with the NCBI credentials for the account you used in the submission metadata.  After you receive further notification from the SRA, if there are errors, you can retrieve the submission report.xml file from SRA servers with the 'NCBI SRA Submission Report Retrieval' App in the DE, make corrections, and resubmit (see below).

Step 6) If the SRA detects submission errors, retrieve SRA-generated submission report, correct, and resubmit  

If error correction and resubmission are needed, the SRA-generated error report can be retrieved with the ' NCBI SRA Submission Report Retrieval' App.  Use this report to correct the errors and resubmit.  Corrections to the submission package can be made within the DE by updating the submission package organization or metadata, and resubmitting beginning with Step 4.

 An example of a retrieved submission report is in the Discovery Environment Data window under Community Data -> iplantcollaborative -> example_data -> SRA_submission -> 2_SRA_Report_Retrieval_Output -> BioProject_Create_Example.

 

To retrieve the submission report, select the “NCBI SRA Submission Report Retrieval” App, and as input, select the CyVerse Analysis output folder generated during the last submission.It will be named with your CyVerse username.  The report will be fetched from the SRA and placed in a new Analysis output folder generated by the retrieval App.  To resubmit, make necessary changes to the submission package data and metadata, resave a BioProject metadata file from the top-level folder of  the submission package, and resubmit with the appropriate SRA submission App (Create or Update).

Figure 10 (click to expand) - Selecting SRA Submission Analysis Output Folder as Input for the SRA Submission Report Retrieval App Input: Navigate to the correct Analysis output folder for the report you want to retrieve. See path underlined in red for an example.  Select the folder that begins with your CyVerse username.  See folder circled in blue for an example.

Click to expand