To search content in this manual only, enter your query above. To search for content in the entire CyVerse wiki, enter your query at the top right.







Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 48 Next »

Requesting DOIs and ARKs

The Data Commons Repository (DCR) is the ideal platform for ease of data reuse. It can store very large datasets, which are difficult to transfer, upload, and download across different computers and platforms. From the DCR, data is accessible to CyVerse's suite of large-scale computational analysis resources, so that users can seamlessly analyze, manage, and publish new results. Within the DCR, your data will be very secure and will have a permanent identifier (a DOI or ARK) for proper citation.

If you are interested in obtaining a permanent identifier for one or more datasets, please read the Permanent Identifier FAQs and then answer the questions on the page, Is the CyVerse Data Commons Repository right for my data?

Your public dataset is the representation of your research. The DCR wants to help you publish complete and well documented datasets by providing tools, guidelines, and permanent identifiers so that your research is showcased in a clear and thorough fashion, allowing others to learn about your project and reuse it. However, you are responsible for the contents and presentation of your dataset.

Requesting a DOI

Step 1: Before you begin, review the related pages.

Step 2: Organize the dataset in the CyVerse Data Store.

There are several steps to properly organizing your dataset. These include determining what data to include, how many identifiers to request, how to organize the data into folders, and creating the ReadMe file and data inventory.

 Learn how to organize your data

Step 2.1. Determine what to include

A data collection may be composed of multiple files and different datasets. In preparing your data for publication:

  1. Identify the data and other materials that you consider useful for validation and reuse of your research:
    • Data associated to a research project may include multiple files with different roles.
    • If there are components of your dataset that belong in a public repository such as NCBI (e.g., fastq files), submit them to the repository, rather than to CyVerse Curated Data.
  2. Beyond data, you will include the ReadMe file (see Step 4), and you may include scripts or links to scripts to run your analysis.

Step 2.2. Determine how many permanent identifiers to request

To determine how many DOIs to request for a given data collection, consider the following:

  • Think about its size and components.
  • How many studies or publications does it represent?
  • Is your data collection formed by different datasets and are those likely to be used separately?
  • Do you want to create a data collection with one DOI for the entire project and additional related DOIs for distinct datasets so that they are cited individually?

If you are uncertain about how many DOIs to request, contact us at

Step 2.3. Organize your data into folder(s)

  1. Organize your data so that there is one folder for each DOI (see CyVerse Curated Data folder-naming guidelines for naming conventions).
  2. Within a folder, include all files in your data package plus the ReadMe file and the inventory.
    • You may have subfolders within a data package.
    • You may include compressed files in a package, as described on the Permanent Identifier FAQs, but do not compress the entire folder/package.

Step 2.4. Name your top level folder according to the guideline

The folder containing your dataset should be named using the $Creator_$subject_$date format.

For more details on folder naming, see the CyVerse Curated Data Folder-Naming Guidelines.

 Step 2.5. Create a ReadMe file

Create a text file labeled "readMe" with the following information:

Step 2.6. Create an inventory

2.7 Supporting documents on data management and organization

Here is a useful guide to data organization: Research Data Management: File Organization (PDF).

Step 3: Submit the request for the DOI.

  1. In the Data window, click the checkbox next to the folder.
  2. Select Metadata > Request DOI.

  3. After verifying you have read the manual (this page), click I need a DOI. You will receive a verification email that your request has been received and a notification will be listed in the Notifications list in the DE.

Step 4: Wait for CyVerse validation checks.

After submitting your request, a CyVerse DCR curator begins validating your dataset, metadata, and overall configuration of your dataset. Validations are based solely on the required DOI metadata and folder-naming conventions, as well as its potential utility to the CyVerse and larger scientific community—not the quality of your data.

  • If the curator determines that the dataset is adequately organized and the DataCite metadata are accurate, they will provide a DOI, and you will be notified of the DOI and the location of its corresponding landing page in the Community Data > commons_repo > curated folder in the DE.
  • If the curator determines that minor changes are needed, they may make those changes themselves.
  • If the curator determines that substantive changes are needed, they will contact you with required changes.
  • If the curator determines that your dataset is not appropriate for the DCR, you will be notified.

To check the status of your request, click Notifications at the top right of the DE screen. For more information on using notifications in the DE, see Viewing and Deleting Notifications.

Requesting an ARK

(Coming Soon — will be similar to a DOI.)

Getting your dataset noticed

Metadata, the description about your data, is key to getting your dataset noticed in the world wide web. Search engines and bibliographic aggregators index the metadata that you create to obtain a DOI. Thus, it is important that you do the following:

  • Make sure the metadata is complete.
  • Include descriptive terms about the science and themes involved in your research.
  • Include methods used to generate the dataset.
  • Include terms such as keywords as well as in the abstract, which should be precise.
  • Depict the dataset to a broader audience so that they understand your research.
  • If you or team members have an ORCID ID, make sure to include it when you edit the authors' names.

Publicizing your dataset

There are several ways to publicize your dataset:

  • Consider using social media to share the DOI of your dataset, and tag CyVerse.
  • If you have an interesting story about your data, contact us at, and we may be able to share it through CyVerse outreach.
  • If you have a tool or workflow you developed to analyze your data in CyVerse, consider presenting it as part of our Focus Forums webinars series.
  • No labels