The applications listed here are available for use in the Discovery Environment and are documented in: Discovery Environment Manual.

Discovery Environment Applications List

The box below searches only this space.
To search the entire iPlant wiki, enter your query in the box at the upper right.

Maintenance Notice

Discovery Environment         7:00am to 5:00pm MST
The Discovery Environment will be unavailable while patches and updates are applied.
        ** Currently running analyses will be terminated. Please plan accordingly.

Data Store                    7:00am to 5:00pm MST
The Data Store will be unavailable during the maintenance period.
Data Commons                  7:00am to 5:00pm MST
The Data Commons will be unavailable during the maintenance period.
Atmosphere and Cloud Services 7:00am to 5:00pm MST
Special NOTE: Marana Cloud - This maintenance includes servicing power supply equipment to Marana Cloud. If you have instances running on Marana Cloud, they may be shut off. Should this happen, you will need to go into Atmosphere and start your instance once the maintenance period has completed. Please plan accordingly.  

User Portal                   7:00am to 5:00pm MST
The User Portal,, will be unavailable while we perform maintenance and updates.
Agave/Science API             7:00am to 5:00pm MST
The Agave/Science API will be unavailable during this maintenance period.
DNA Subway                    7:00am to 5:00pm MST
DNA Subway will be unavailable during this maintenance period.

The following services will NOT be affected by the maintenance: CyVerse Wiki, Bisque Website, and Jira.  

Check your local timezone here
Keep up to date with our maintenance schedules on the CyVerse public calendar
Please contact for any questions, or concerns.




Skip to end of metadata
Go to start of metadata

Rationale and background:

Understanding how your storage space is being used is a key step in managing data.

This app builds a database of files stored in iRODS collections (such as the CyVerse data store), Amazon S3 buckets, or directories on your device, and allows you to search, sort, and compare them. It provides information about file sizes, types, and duplicated files.


The launch page offers five options for importing file data into DataHog:

  1. iRODS: Use the iRODS API to import data from a specific collection. The options for importing files from the CyVerse data store are prefilled.
  2. .datahog File: Upload a .datahog file containing file data. These can be generated by a Python script which you can download and run on any machine.
  3. CyVerse: Use the CyVerse file search API to import any data stored in the data store. This method currently does not support exact duplicate matching, and may be slower than iRODS in some cases.
  4. S3 Bucket: Use your AWS access keys to import an S3 bucket, or a specific directory from one.
  5. Restore Database: If you previously backed up a DataHog database, you can upload it to restore your data.

Depending on how many files are being scanned, the import process can take a few minutes to complete. Some extremely large directories (millions of files) may take much longer–feel free to close the tab and check up on it later if you wish.

Once the import process for your first file source is complete, you will have access to 4 tabs:

  1. Summary: View a summary of each of your file sources, including various file rankings and visualizations.
  2. Browse Files: Explore the folder structure for each of your file sources, or search your files using names, regex expressions, or date and size filters. Each column header can be clicked to sort the table by that value.
  3. Duplicated Files: View a list of files with identical contents. By default, this page uses checksums to compare files, but file sizes or names can also be used. Each column header can be clicked to sort the table by that value.
  4. Manage File Sources: Import a new file source, remove an existing one, or download a backup of the current file database.

Mandatory arguments



Chris Klimowski (UA Data Science Institute: Data7)


  • No labels