To search content in this manual only, enter your query above. To search for content in the entire CyVerse wiki, enter your query at the top right.
__________________

DATA COMMONS USER MANUAL
Maintenance: Tues, Jan 28, 2020

ACCESS TO OR USAGE OF THE FOLLOWING SERVICES WILL BE UNAVAILABLE OR DISRUPTED:

Discovery Environment         8:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    8:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
 
Data Commons                  8:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
 
Atmosphere and Cloud Services 8:00am to 5:00pm MST
Marana Cloud: Atmosphere instances in the Marana Cloud will be operational; however, you will not be able to use the Data Store within your instance, and you may not be able to access the Atmosphere web interface.
 
User Portal                   8:00am to 5:00pm MST
The User Portal, http://user.cyverse.org, will be unavailable while we perform maintenance and updates.
 
Agave/Science API             8:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
 
DNA Subway                    8:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.
 
The following services will NOT be affected by the maintenance: CyVerse Wiki and JIRA

Keep up to date with our maintenance schedules on the CyVerse public calendar
http://www.cyverse.org/maintenance-calendar
Check your local timezone here https://bit.ly/36iVOkX 
 
Please contact support@cyverse.org for any questions, or concerns.

 

 

 

 

 

 

Skip to end of metadata
Go to start of metadata

What is FAIR Data?

FAIR data principles (Wilkinson et al. 2016) provide a set of basic requirements for making research data Findable, Accessible, Interoperable, and Reusable.

The FAIR Guiding Principles (Box 2 from Wilkinson et al. 2016)

 To be FAIR...

To be Findable:

F1. (meta)data are assigned a globally unique and persistent identifier

F2. data are described with rich metadata (defined by R1 below)

F3. metadata clearly and explicitly include the identifier of the data it describes

F4. (meta)data are registered or indexed in a searchable resource

To be Accessible:

A1. (meta)data are retrievable by their identifier using a standardized communications protocol

A1.1 the protocol is open, free, and universally implementable

A1.2 the protocol allows for an authentication and authorization procedure, where necessary

A2. metadata are accessible, even when the data are no longer available

To be Interoperable:

I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (meta)data use vocabularies that follow FAIR principles

I3. (meta)data include qualified references to other (meta)data

To be Reusable:

R1. meta(data) are richly described with a plurality of accurate and relevant attributes

R1.1. (meta)data are released with a clear and accessible data usage license

R1.2. (meta)data are associated with detailed provenance

R1.3. (meta)data meet domain-relevant community standards

The CyVerse Data Commons is working to ensure that each of these principles is met for datasets published through our CyVerse Curated Data program (datasets with DOIs), and is working with community members to make Community Released Data as FAIR as possible. 

FAIR data throughout the data life cycle

Most scientists are used to generating and analyzing data, but more and more, scientists are publishing their data or discovering and reusing published data. Although the FAIR principles provide guidance on what features published data should have to be findable, accessible, interoperable, and reusable, the best way to make data FAIR is to plan for it at every stage of the data life cycle.

CyVerse has features that support data management throughout the life cycle. This section provides links to some of those features.

Data Generation

Most CyVerse users generate their data externally and bring it into CyVerse for analysis. For information on how to bring data into CyVerse, see the wiki page on Downloading and Uploading Data

Through CyVerse analysis tools, you may generate new data on the CyVerse Data Store.

For analyses run in the Discovery Environment, the data is stored in the Analyses folder in your home directory (unless you specify a different output directory). You can use the Analyses Window to view the parameters associated with the any output data created in the Discovery Environment (see Using the Analyses Window). 

Data Analysis

CyVerse offers several platforms for analyzing data, each with features that support FAIR data.

For beginners, have a look at the Discovery Environment Manual or the Atmosphere Manual.

Discovery Environment (DE) features for reproducible science include:

  • The DE stores metadata on every analysis run, including input and output files, time run, who ran it, and all parameters. Any analysis can be relaunched using the same parameters.

 

Data Publication

See the page on Publishing Data through the Data Commons.

Data Discovery

Search:

All data on the CyVerse Data Store are indexed using ElasticSearch. There is an advanced search interface in the Discovery Environment, which will return results for all data you have permission to see (your own data, data shared with you, and public data). There is a simple search interface in the Data Commons, which will return results from all public data in the /iplant/home/shared directory, that is, all data in the Data Commons.

Metadata:

The best way to make your data more discoverable is to use metadata. If you publish data with a DOI through the Data Commons, you are required to add the DataCite metadata template, but you can also add custom metadata. If you are the owner of a Community Released Data folder, you are required to add the Dublin Core metadata template to the parent folder, but you should also add it to any relevant sub-folders. 

Data Reuse

 

External Resources:

If you must use Excel spread sheets, read this first: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989 

https://dataoneorg.github.io/Education/ 

https://www.dataone.org/best-practices

https://www.force11.org/group/joint-declaration-data-citation-principles-final

https://www.nsf.gov/pubs/2018/nsf18041/nsf18041.jsp 

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

https://www.biorxiv.org/content/early/2018/09/16/418376

http://terraref.org/articles/existing-data-standards-and-tools/

 

 

 

  • No labels