2014.04.10 BIEN db

BIEN Database

April 10, 2014

Participants

Aaron, Brad, Mark, Martha

Agenda

  • Switch meetings to Wednesdays? (Martha, 5 min.)
  • Review progress and address questions on quantitative validations (Aaron, 45 min.)
  • Outline next steps (5 min.)

Previous Week's To Do for Aaron

aggregating validations

  1. specimens queries
    1. implement workaround for the slowdown in query #12
    2. run pipeline on NY to generate diffs
  2. plots queries
    1. write denormalized plots input queries, using VegBank as the example datasource
    2. finish fixing plots output queries
  3. validate datasources
    1. SALVIAS
    2. denormalized plots datasources: VegBank, CVS, CTFS
    3. specimens
    4. FIA (special case, with separate input queries)
    5. normalized plots datasources: TEAM, Madidi
      1. denormalize
      2. validate

Notes

TNRS Discussion

  • The approach Andy suggested of cloning the TNRS production server to create a dev server for Brad to use is a good approach -- more reliable than building the dev server from scratch.
    • Martha will ask Nicole to document this approach for future reference.
    • Brad will document what needs to be done to add a new name source.
  • Nicole has concerns about whether the TNRS GUI will automatically display the new name source correctly. Ray said it should.
    • Brad will let Nicole, Naim, Martha know ASAP if it does or doesn't display correctly in the UI when he gets to that step.
    • If there's a problem with the UI, Martha will get someone to fix it, but that won't hold up BIEN db work since the names could be scrubbed through the dev server without UI if necessary.

Specimen Queries

  • Specimen input queries were written last week. 
  • Still working on determining if they are correct.
  • Query 12 works now.
  • Specimen pipeline now runs.

Results of running NY specimen queries

  • Table taxonverbatim table stores taxonomic name as provided from the source.
  • Taxon name can be stored in 3 ways:
    • Taxonomicname - concatenated
    • Taxonname – genus plus specific epithet
    • Parsed out - family, genus, specific_epithet
  • Subspecies isn’t included in the schema in the taxonverbatim table.
    • Add subspecies to taxonverbatim table.
  • What is stored in taxonlable table?
    • Verbatim names
  • Taxonverbatim, taxonlabel, taxonconcept only store verbatim name data, not name data from TNRS.

Is the TNRS resolved name stored in the VegBIEN table?

  • No
  • There is a separate table for the TNRS name.
  • Is this part of the VegBIEN schema?
  • No.
  • It is an auxiliary table since it is derived data.
  • Same is true for the geoscrubbed data.
  • There is a separate namespace for those tables, specifically tnrs and geoscrub.
  • Recall that there was a difference of opinion about where these derived data, esp. the TNRS data should be stored.
    • Brad thought they (at least TNRS names) should be stored in the VegBIEN database.
    • Mark thought they should be stored outside the VegBIEN database.
    • We have landed with them being stored outside VegBIEN.
    • Is this workable?
  • The VegBIEN model was built with the intent to hold the resolved names.
    • So given where we’ve landed, some of the tables in the VegBIEN schema are irrelevant.
    • It's not necessary to normalize the verbatim names into taxonlabel, taxonconcept (but not suggesting a schema change now).
  • When we build the analytical tables we will need to query the tnrs and geoscrub tables.
    • Need to include these auxiliary tables in the ERD.
  • We can live with the TNRS and geoscrubbed names being stored in auxiliary tables. It will work.
    • Taxon determination and taxon verbatim are the only necessary tables in the VegBIEN schema given how this is being done (but not suggesting a schema change now).

Going through the results of specimen queries on NY that had diffs

  • Queries 6,7
    • Adding subspecies to taxonverbatim should resolve the diff in 6,7.
    • Plus adding the subspecies filter (that was removed from 4, 5)
  • Query 4
    • Reiteratied that the quantitative outputs need to be included to diagnose problems with the queries.
    • Fixed the input query. There was a copy/paste error in the query Brad wrote.
  • Query 5
    • Same problem as 4. So fixed.
  • Added the clause to 6,7.
  • Aaron needs to provide head, tail, middle of query result sets that generate lists -- for the purpose of troubleshooting problems in queries.
  • Query 13
    • a set of nested queries
    • recommendation is for Aaron to run each individual query manually to determine if one of them is the cause of the diff.

Wrap up

  • It's important for Aaron to send out results between calls so people can suggest fixes. We can't afford to wait for the calls.
  • Brad unavailable until Monday.

To Do for Aaron

VegBIEN schema

  • add taxonverbatim.subspecies
  • add TNRS and geoscrub tables to ERD

specimens aggregating validations

  1. fix input queries #4,5: remove subspecies IS NOT NULL filter
  2. fix input queries #6,7: add subspecies IS NOT NULL filter
  3. fix output queries #6,7: use subspecies instead of the concatenated taxonomic name