2013.12.12 BIENdb

BIEN Database

December 12, 2013

Participants

Aaron, Brad, Ramona, Martha

Agenda

1) Attributions & conditions for use (5 min.)

  • Is this "done for Dec."?

2) Validations (45 min.)

  • CVS (w/ Mike, Bob)
  • GBIF (w/ Brad)
  • FIA (w/ Aaron)
  • Do these have egregious problems?** ARIZ
    • Madidi
    • MO
    • BIEN2 Traits
  • Do people agree these do not have egregious problems?** UNCC
    • TEX
    • U

3) Derived Data Formulas (w/Brad) (5 min.)
4) Reminder of planning call Tuesday, Dec. 17th at 9 AM Pacific, 10 AM Tucson, noon Eastern time

To Do (From last week's call)
Brian, Brad, Bob, Mark

  • Review the outstanding issues and features in the *[Data Sources table|https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Datasource_validation_status] and flag them as "egregious" or "not egregious" so Aaron knows which to fix by December 20.* For any "egregious" items, followup with an email to the bien-db mailing list to inform everyone in case raging debate should ensue.

Brad

  • Send Aaron the final set of queries for the quantitative (aggregating) validations.

Aaron

  • CVS: Work with Bob and Mike to complete the "record spot check" validations using the current approach (input and output tabs).
  • GBIF, FIA, BIEN2 traits: Send Brad just the "output" information for record spot checking.
  • Run the aggregating queries on each source (dependent on Brad providing the queries)
  • Fix any problems deemed "egregious" in data sources (dependent on 3B's and Mark identifying the egregious ones).

Notes

Attributions & conditions for use

  • Is this "done for Dec."?
  • A data extract would have the information in each row.
    • So the information is there and is in the normalized database.
    • Could do a "select distinct" on the data source and then create a separate file with the list of providers to be attributed for any extract.
  • Brad will check if all contributors are contained within the database.
    • SALVIAS is a secondary data proider.
    • The most important providers are the PIs associated with a project.
      • The collector of the plot
      • The project PI
    • Aaron thinks locationEventContributor contains the PI information for VegBank.
    • This may not be an attribution issue per se, but Brad wants to check it.
  • Brad will run some checks to verify primary providers of plot data is in the database. By Friday.
    • He'll also check for accurate capture of projects.
      • He will cross-check with Bob if necessary.
    • Brad will sign off via email.

CVS

  • Martha remind Bob to review and sign off on CVS validation extract.

GBIF

  • In Brad’s court.
  • BB: Needs guidance from Aaron.
  • A: Look at the input tab he provided. Compare the output to that.
  • Discussion of how GBIF data is organized.
  • GBIF is specimen data, so easy to validate.
  • Brad prefers to query the GBIF MySQL table.
  • Aaron to point Brad to the MySQL table.
  • Brad will review and sign off.

FIA

  • Aaron just sent the extract to Brad.
  • Now FIA is in Brad’s court.
  • Since this is plot data, it will take longer to validate. Brad needs to think about how to do it. Will email Aaron with any questions.
  • Brad will need time in January to do the thorough check against the actual FIA db.
  • Can do limited spot checking now.

UNCC, U, TEX

  • Aaron send extracts for UNCC, U to Brad.
  • Do not send a TEX extract.
  • There's no point since there were a number of problems identified and they haven't been fixed.

What about data sources that have not been signed off on?

  • What are the options?
    • Pull data sources that have not been signed off on.
    • Flag them.
  • Brad will review ALL datasets and categorize them as: Release, withhold, flag as having problems (next week)
  • Aaron send BIEN2 traits extract to Brad.
  • Quantitative checks still need to be done on all data sources.
    • Quantitative checks are the same for all specimens.
    • Brad will do them, but not until January.
    • Aaron add separate table on the wiki to track the quantitative validation checks.

Derived Data Formulas (TNRS names)

  • A: Since these are derived, this could wait til Jan.
    • BB: No. This has high potential to impact data validity. It must be checked now.
    • This item arose from Bob’s spot check validations.
  • Brad will review the emails on the TNRS derived data formulas.
  • He will report on seriousness by Friday

Other

  • Aaron will create separate wiki page (one row per source) with a link to the current validation extract.
  • Aaron and Brad will discuss workflow for quantitative validations in Jan.

Decisions

  • Brad will need time in January to do the thorough check against the actual FIA db.
  • Quantitative checks are the same for all specimens.
    • Brad will do them, but not until January after discussions with Aaron.

To Do

Attributions
  • Brad (By Friday):
    • Verify primary provider of plot data is in the database.
    • Verify accurate capture of "projects".
    • Sign off via email (or describe any problems found).
Validations

CVS

  • Martha: Remind Bob to review and sign off on CVS validation extract. (Aaron already emailed Mike who will look at it tomorrow.)
  • Bob, Mike: Validate the extract and notify the group of the status via the mailing list.

GBIF

  • Aaron: Point Brad to the GBIF MySQL table.
  • Brad: Validate GBIF and report the status via the mailing list.

FIA

  • Brad: Will do limited spot check validation.

UNCC, U, TEX

  • Aaron: Send extracts for UNCC and U to Brad.
    • Do not send a TEX extract (since the problems haven't been fixed).

BIEN2 traits

  • Aaron: Send extract to Brad.

All data sources

  • Brad (next week): Will review ALL datasets and categorize them as: Release, withhold, flag as having problems.
TNRS derived data formulas
  • Brad (by Friday): Will review the emails on the TNRS derived data formulas.
    • He will report on seriousness of this by Friday.
Other
  • Aaron: Will create separate wiki page (one row per source) with a link to the current validation extract.