2014.03.27 BIEN db

BIEN Database

March 27, 2014

Participants

Aaron, Brad, Mark, Martha

Agenda

Review progress and address questions on quantitative validations (Aaron, 50 min.)
Outline next steps (5 min.)

Previous Week

Aaron's To Do List
Write the additional plot query assigned last week (#19)

Recall that Aaron will complete the validations using the current schema.

Remaining (multi-week) validation work for Aaron:
1. Write the specimen output queries
2. Complete all plot input queries, including modifying Brad's FIA input queries
3. Write the specimen input queries
4. Validate all plot datasets
5. Validate all specimen datasets

Notes

Specimen input queries are finished.
Specimen output queries are about half done.

Order of work for Aaron

First
- Specimen input validation queries are done, but Aaron needs to change to using concatenated name.
- Write output queries for specimens (estimates 1-2 days)
Second
- TEAM: Rename columns (half-1 day), denormalizing (half-1 day)
- Madidi: Already renamed, just needs denormalize (half-1 day)
- So estimates will have them done in 2-3 days.
Third
- Write the VegCore input queries for plots (maybe 3 days).

Decisions

plots aggregating validations

won't denormalize SALVIAS because already have input queries for it (Brad)
validate FIA last because it's a special case (Brad)

specimens aggregating validations

OK to run NY validations when writing specimens output queries instead of at the end with the other specimens datasources (Brad)
when writing specimens output queries based on NY input queries, treat query name as authoritative rather than query implementation (Brad)
use taxonoccurrence as the main specimen table
use concatenated taxon name instead of concatenating the ranks, since not all specimens datasources provide the ranks

new-style import

needs to include the denormalization of normalized datasources

NY

use artificial key as pkey instead of removing rows that are missing an accessionNumber (Brad)

To Do for Aaron

aggregating validations

finish specimens output queries
- use concatenated taxon name instead of concatenating the ranks
- in #1, use taxonoccurrence instead of location as the main specimen table
run specimens output queries on NY to test them
denormalize normalized plots datasources: TEAM, Madidi
write denormalized plots input queries
finish fixing plots output queries
validate plots datasources: SALVIAS, VegBank, CVS, TEAM, Madidi, CTFS, FIA
validate specimens datasources

new-style import

denormalize normalized plots datasources: TEAM, Madidi, later SALVIAS

NY

use artificial key as pkey instead of accessionNumber

FIA

remap what's mapped to locationName to a suffix of it instead (locationName itself corresponds to plotCode below)
map INVYR to a suffix of authorEventCode (authorEventCode itself corresponds to plotCensusCode below)
add plotCode, plotCensusCode derived columns:
plotCode = CONCAT_WS("_",STATECD,COUNTYCD,PLOT)
plotCensusCode = CONCAT_WS("_",STATECD,COUNTYCD,PLOT,INVYR)
- mapped to locationName, authorEventCode