2013.12.12 BIENdb
BIEN Database
December 12, 2013
Participants
Aaron, Brad, Ramona, Martha
Agenda
1) Attributions & conditions for use (5 min.)
- Is this "done for Dec."?
2) Validations (45 min.)
- CVS (w/ Mike, Bob)
- GBIF (w/ Brad)
- FIA (w/ Aaron)
- Do these have egregious problems?** ARIZ
- Madidi
- MO
- BIEN2 Traits
- Do people agree these do not have egregious problems?** UNCC
- TEX
- U
3) Derived Data Formulas (w/Brad) (5 min.)
4) Reminder of planning call Tuesday, Dec. 17th at 9 AM Pacific, 10 AM Tucson, noon Eastern time
To Do (From last week's call)
Brian, Brad, Bob, Mark
- Review the outstanding issues and features in the *[Data Sources table|https://projects.nceas.ucsb.edu/nceas/projects/bien/wiki/Datasource_validation_status] and flag them as "egregious" or "not egregious" so Aaron knows which to fix by December 20.* For any "egregious" items, followup with an email to the bien-db mailing list to inform everyone in case raging debate should ensue.
Brad
- Send Aaron the final set of queries for the quantitative (aggregating) validations.
Aaron
- CVS: Work with Bob and Mike to complete the "record spot check" validations using the current approach (input and output tabs).
- GBIF, FIA, BIEN2 traits: Send Brad just the "output" information for record spot checking.
- Run the aggregating queries on each source (dependent on Brad providing the queries)
- Fix any problems deemed "egregious" in data sources (dependent on 3B's and Mark identifying the egregious ones).
Notes
Attributions & conditions for use
- Is this "done for Dec."?
- A data extract would have the information in each row.
- So the information is there and is in the normalized database.
- Could do a "select distinct" on the data source and then create a separate file with the list of providers to be attributed for any extract.
- Brad will check if all contributors are contained within the database.
- SALVIAS is a secondary data proider.
- The most important providers are the PIs associated with a project.
- The collector of the plot
- The project PI
- Aaron thinks locationEventContributor contains the PI information for VegBank.
- This may not be an attribution issue per se, but Brad wants to check it.
- Brad will run some checks to verify primary providers of plot data is in the database. By Friday.
- He'll also check for accurate capture of projects.
- He will cross-check with Bob if necessary.
- Brad will sign off via email.
- He'll also check for accurate capture of projects.
CVS
- Martha remind Bob to review and sign off on CVS validation extract.
GBIF
- In Brad’s court.
- BB: Needs guidance from Aaron.
- A: Look at the input tab he provided. Compare the output to that.
- Discussion of how GBIF data is organized.
- GBIF is specimen data, so easy to validate.
- Brad prefers to query the GBIF MySQL table.
- Aaron to point Brad to the MySQL table.
- Brad will review and sign off.
FIA
- Aaron just sent the extract to Brad.
- Now FIA is in Brad’s court.
- Since this is plot data, it will take longer to validate. Brad needs to think about how to do it. Will email Aaron with any questions.
- Brad will need time in January to do the thorough check against the actual FIA db.
- Can do limited spot checking now.
UNCC, U, TEX
- Aaron send extracts for UNCC, U to Brad.
- Do not send a TEX extract.
- There's no point since there were a number of problems identified and they haven't been fixed.
What about data sources that have not been signed off on?
- What are the options?
- Pull data sources that have not been signed off on.
- Flag them.
- Brad will review ALL datasets and categorize them as: Release, withhold, flag as having problems (next week)
- Aaron send BIEN2 traits extract to Brad.
- Quantitative checks still need to be done on all data sources.
- Quantitative checks are the same for all specimens.
- Brad will do them, but not until January.
- Aaron add separate table on the wiki to track the quantitative validation checks.
Derived Data Formulas (TNRS names)
- A: Since these are derived, this could wait til Jan.
- BB: No. This has high potential to impact data validity. It must be checked now.
- This item arose from Bob’s spot check validations.
- Brad will review the emails on the TNRS derived data formulas.
- He will report on seriousness by Friday
Other
- Aaron will create separate wiki page (one row per source) with a link to the current validation extract.
- Aaron and Brad will discuss workflow for quantitative validations in Jan.
Decisions
- Brad will need time in January to do the thorough check against the actual FIA db.
- Quantitative checks are the same for all specimens.
- Brad will do them, but not until January after discussions with Aaron.
To Do
Attributions
- Brad (By Friday):
- Verify primary provider of plot data is in the database.
- Verify accurate capture of "projects".
- Sign off via email (or describe any problems found).
Validations
CVS
- Martha: Remind Bob to review and sign off on CVS validation extract. (Aaron already emailed Mike who will look at it tomorrow.)
- Bob, Mike: Validate the extract and notify the group of the status via the mailing list.
GBIF
- Aaron: Point Brad to the GBIF MySQL table.
- Brad: Validate GBIF and report the status via the mailing list.
FIA
- Brad: Will do limited spot check validation.
UNCC, U, TEX
- Aaron: Send extracts for UNCC and U to Brad.
- Do not send a TEX extract (since the problems haven't been fixed).
BIEN2 traits
- Aaron: Send extract to Brad.
All data sources
- Brad (next week): Will review ALL datasets and categorize them as: Release, withhold, flag as having problems.
TNRS derived data formulas
- Brad (by Friday): Will review the emails on the TNRS derived data formulas.
- He will report on seriousness of this by Friday.
Other
- Aaron: Will create separate wiki page (one row per source) with a link to the current validation extract.