The Data Commons publishes data to our own repository at datacommons.cyverse.org as well as external repositories. All data published to Data Commons Curated Data receive a permanent identifier (PID) in the form of a DOI (Digital Object Identifier) or ARK (Archival Resource Key) and are expected to be stable and permanent. Data published to the Cyverse Community Contributed folder do not have PIDs, and may change or be removed at any time. The sections below provide more information on each kind of data publication available through CyVerse. For more details on the range of data sharing options in CyVerse, see the CyVerse Data Policy.
Publishing Data Commons Curated Data
CyVerse provides a landing page for each public dataset. Such landing page is populated with the metadata provided by the user.
DOIs are assigned upon request of the project lead. A DOI is a type of global identifier that allows a digital object to be persistently referenced on the Internet even if the item is moved to another online repository. DOIs use the DataCite metadata schema for purposes of citation. However, for data to be reused, more descriptive information is required so we encourage users to further document their datasets. Please see ……
What about data that has been published already elsewhere
If an upload involves data that has been published elsewhere and or has an existing DOI, project leads have the opportunity to reference those datasets using the External URL box. The existing DOI and/or a link can be added to the dataset information.
Upon publication data creators can request to retrieve their data from the repository. To do so, a User must contact the repository curator and provide a justification. A record stating that the dataset was available and including an abstract and an explanation about why the data was removed will be in place. It has to be reminded that the dataset will have a DOI and that DOI will remain active so that when people use it from a citation they can verify that the data is no longer there.
Publishing CyVerse Comnunity Contributed Data
Publishing to external repositories
SRA pipeline: Data Commons enables CyVerse users to make submissions to the NCBI Sequence Read Archive directly. Submissions instructions include compressed sequenced files (FASTQ.gz, SFF.gz, and BAM.gz) and an XML metadata file, organized into a submission package.
WGS pipeline and TSA: Coming soon