2014.05.29 BIEN db

BIEN Database

May 29, 2014

Participants

Aaron Marcuse-Kubitza, Mark Schildhauer, Martha Narro

Agenda

  • Taxon name rescrub (Aaron)
  • Data dictionary (Martha)
  • Disk Space Leak (Aaron)

Notes

Taxon name scrubbing

  • Postpone doing Redmine issue 929 (change TNRS client to store meatdata). Add the metadata manually after scrubbing, so that scrubbing can begin.
  • The full name scrubbing workflow can be finalized later.
  • Begin scrubbing names on Friday.
  • It's fine to submit as many as 100,000 names at a time, especially since Aaron will be the only person submitting names the development server.
    • If you have any problems contact Nicole.
    • Martha send Nicole’s contact information.

Data Dictionary

  • Completing the Data Dictionary is the priority after the taxon name scrubbing is running.
  • The Data Dictionary that is needed is for normalized BIEN3 database (VegBIEN) and the associated tables (such as TNRS and GeoScrub) – all tables that may be used to create analytical tables (views).
  • If data terms have been borrowed from elsewhere (e.g., DwC, VegBank, Salvias, VegX), link to those definitions. This means Aaron will need to write definitions for far fewer terms – only for those unique to BIEN3.
  • Begin by defining (or linking) the (approximately 60) terms needed for the BIEN3 viewFullOccurrence table.
  • After those are completed, move on to completing the Data Dictionary for the rest of BIEN3 and associated tables.
  • The estimated time to complete a draft of the full Data Dictionary is 100 names per day, which will be about 3-4 days.
  • The definitions need to be drafted and linked before June 9th (at least for terms in viewFullOccurrence and ideally for the entire Data Dictionary).

Decisions

  • Postpone doing Redmine issue 929 (change TNRS client to store meatdata) so taxon name scrubbing can begin Friday. Full taxon name scrubbing workflow can be completed later after component parts are working.
  • It's fine to submit 100,000 names at a time for scrubbing on the development server.

Action Items

Aaron

  1. Begin scrubbing taxon names on Friday (expected to run 4 days or more).
  2. Define the Data Dictionary terms for VegBIEN and associated tables.
    1. First define the (approximately 60) terms needed for the BIEN3 viewFullOccurrence table. For terms defined in other name spaces (e.g., DwC, VegBank, Salvias, VegX), link to those definitions.
    2. After those are finished, move on to completing the Data Dictionary for the rest of BIEN3 and associated tables.
    3. The definitions need to be drafted and linked before June 9th (at least for terms in viewFullOccurrence and ideally for the entire Data Dictionary).
  3. For now, working on disk space leak is lower priority than items listed above.

Martha

  • Send Nicole’s contact information: Nicole Hopkins nicole@iplantcollaborative.org (DONE)
    • Please copy the BIEN list when communicating with her to keep everyone in the loop.
  • Send DDL for the BIEN3 viewFullOccurrence table that Brad provided. (DONE)