Draft - Pending final approval

CyVerse Data Commons User Agreement

This agreement is between CyVerse and the users of the CyVerse Data Commons. Using services or data available at or submitting data to the Data Commons (DC) requires agreeing to the following policies. This document covers only policies specific to the DC. The CyVerse Data Policy covers policies relevant to any data hosted by CyVerse, including data in the DC. Acceptance of this document implies acceptance of the CyVerse Data Policy. Please see other CyVerse Policies for general usage of CyVerse cyberinfrastructure.

About the Data Commons

The Data Commons (DC) provides services within the CyVerse cyberinfrastructure to organize, preserve, and publish data derived from scientific research. We strive to aid researchers in creating, managing, publishing, reusing, and discovering research data by:

  1. Facilitating metadata entry and acquisition

  2. Supporting the translation of metadata across existing metadata standards such as DataCite, Dublin Core, or MIxS

  3. Publishing data through the Data Commons Repository or to external repositories

  4. Providing access to public data that is in the CyVerse Data Store

  5. Providing persistent access to datasets through globally unique, permanent identifiers (DOIs and ARKs)

  6. Connecting data to analyses conducted on CyVerse platforms to support reproducible science

  7. Raising data visibility and discoverability

  8. Preserving datasets in secure and reliable large-scale storage systems

DC development builds on foundational CyVerse infrastructure such as our Data Store, APIs, and user interfaces, while expanding into new areas such as metadata and ontologies, a data repository, and federation with external collaborators and repositories. Key components of the Data Commons are the web data portal at http://datacommons.cyverse.org/ (also http://dc.cyverse.org) and functions within the Discovery Environment such as metadata templates, permanent identifier requests,  data submissions to NCBI, and a Projects Interface (under development).

Data Commons Mission and Vision 

Vision Statement

To provide infrastructure for open data where researchers can organize, preserve, and publish data derived from scientific research and where data can live as a searchable, discoverable, and reusable resource.

Mission Statement

To aid researchers in creating, managing, publishing, reusing, and discovering data.

Data Commons Functionalities

Data Organization and Curation

Data curation is the set of processes involved in generating and maintaining a sustainable, complete, and accurate dataset across time. In the DC, users are the primary, specialized curators of their own data, because they know their data and how it was produced. DC users are responsible for organizing and describing their data in a way that represents their research. To facilitate these activities, the DC provides functions through the Discovery Environment where users can organize and append standardized metadata to the data that they will publish, including metadata templates and bulk metadata upload. In addition, data curators on the DC team are available for consultation about how to organize data, what metadata standards are recommended for your data, and how to assign identifiers.

 

A data curator verifies all datasets that are submitted for publication and will contact users if they identify incomplete metadata or any issue that can be improved to present data to the public in a way that is clear for reuse. The curator does not verify the contents or quality of the data files; this is the responsibility of the researcher creating the dataset. Data in the DC is not peer reviewed, but it may be reviewed outside CyVerse as part of a journal article.

Data Hosting and Publication

The DC hosts public data (data accessible without a user account) that is stored in Public Data Folders under the directory /iplant/home/shared and allows CyVerse users to publish data through the Data Commons Repository (DCR). The DC also supports publication to selected External Repositories. See the CyVerse Data Policy for a detailed description of the different types of data stored at CyVerse. The key difference between Public Data Folders and the DCR is that Public Data folders are controlled by a community member and subject to change whereas folder in the DCR are unchanging and can only be updated by DC curators.

 

The following points apply to all data made available through the DC, either in Public Data Folders or the DCR:

Public Data Folders

Public Data folders are available for evolving datasets that individuals or communities want to make available as quickly as possible for research and reuse. Public Data Folders are intended for datasets that are growing or changing frequently or that may not need long-term preservation. Data can transition from a Public Data to published in the DCR.

 

In addition to the policies above for all data in the DC, the following apply specifically to data in Public Data Folders:

Data Commons Repository (DCR)

Data publication to the DCR is a service offered for datasets that are intended to be stable and permanent. For published data, the DCR provides landing pages, permanent Digital Object Identifiers (DOIs) or Archival Resource Keys (ARKs) and the requirement to include an open data license. Permanent identifiers allow data to have a stable location on the web so that other users can always find it, along with the information that makes it understandable, citable, and reusable.  An open data license is important to allow others to reuse your data, but it does not exclude users from the obligation to correctly cite your data.

 

In addition to the policies above for all data in the DC, the following apply specifically to data in the DCR:

External Repositories 

The DC provides documented and easy to use workflows for users who want to publish data through canonical repositories such as NCBI.

Reusing Data

The DC fully supports reuse of the data it hosts. If you download or reuse any data in the DC, you must:

 

New data derived from original DC data may be distributed only under terms and conditions established by the creators of the data and stated in the license.

Long–term Preservation and Access to Data in the DCR

Data in the DCR are stored in a high-performance storage resource that has built-in redundancy and is continuously monitored for security and failure, and they are synchronously backed up at both the University of Arizona in Tucson and at the Texas Advanced Computing Center in Austin. At ingest into the DCR, data are manually checked for organization, format (to ensure that they are readable by non-proprietary software), completeness of metadata, and inclusion of a ReadMe file. An md5 checksum is generated and displayed as part of the file’s metadata so that users can check its authenticity.

 

Data and metadata in the DCR are are visible to anyone via the Data Commons web interface and via all methods described in Downloading Data with a User Account. Through a contract with EZID, CyVerse is committed to the long term preservation of data in the DCR. If DCR services are discontinued for some reason, we will make arrangements to transfer the published data free of charge to another long-term repository that will sustain access to the data and metadata, and the DOIs will be redirected to the new location. All CyVerse users will be notified of the new location of DCR data before the move is completed.

 

Data and metadata in Public Data folders in the DC (not the DCR) are not guaranteed for long term preservation. Public Data folders that are in active use (have been accessed in the past year) are available via browsing at datacommons.cyverse.org, can be accessed without a CyVerse user account using any of the methods described on Downloading Data without a User Account, and are searchable to anyone with a CyVerse user account through the Discovery Environment. Data in Public Data folders that are inactive (have not been accessed in over one year) may be moved to a long-term storage archive, where the data will be available upon request. The owner of the Public Data folder will be notified before data is moved to a storage archive.

Disclaimers

THE SERVICES AND DATA OF THE CYVERSE DATA COMMONS ARE PROVIDED “AS IS”. NO WARRANTIES OR REPRESENTATIONS ARE MADE RELATING TO THE DC OR ANY DOCUMENTATION. NO WARRANTY IS PROVIDED THAT THE DATA COMMONS PORTAL OR ANY DATA WILL SATISFY ANY REQUIREMENTS, THAT THE DC OR ANY OF THE DATA THEREIN IS WITHOUT DEFECT OR ERROR, OR THAT OPERATION OF CYVERSE WILL BE UNINTERRUPTED. ALL TERMS AND CONDITIONS OF THE CYVERSE SERVICE LEVEL AGREEMENT,  CYVERSE DATA POLICY, AND CYVERSE ACCEPTABLE USE POLICY APPLY TO THE DATA COMMONS, INCLUDING THE FOLLOWING POINTS:

Agreement and Policy Subject to Change

 

The functionalities, business model, and characteristics of the DC are continually improving; thus details of this agreement and policy are subject to revision every 3 months.