2014.05.22 BIEN db

BIEN Database

May 22, 2014

Notes

The regular call was canceled since Martha had talked to Aaron at length on Tuesday to discuss the next set of work, as well as to Mark to catch up on things.

Disk Space Leak

The snapshots are being restored this week so the disk space leak can be tracked down beginning next week. It takes time for the snapshots to be pulled from the tape backup, so this is proving to be a slow step.

Taxon name validation

Meanwhile, Aaron is doing the scripting necessary to validate taxon names, including rescrubbing them through the new version of TNRS. He and I talked for quite some time yesterday to clarify. We now have the work described in a set of issues in Redmine. The parent issue is #928 and the subtasks are linked from there. At a high level, the subtasks are:

  • Make the changes we spelled out when Bob did preliminary taxon name validation #915 and #916.
  • Include morpospecies name formation as part of normalized DB workflow (currently is part of analytical_stem view) #927
  • Make BIEN TNRS client store TNRS metadata as described last June #929.
  • Re-scrub the names as noted in steps 1-3 of using the new version of TNRS (which respires he first redo the schema for the table to store all matches, not just best match from TNRS) #917
  • While the names are re-scrubbing, so the scripting to sort through all matches to find the best match and handle synonyms using the algorithm in step 4-6 of #917. (details of step for forthcoming from Brad)

Aaron expects to be able to begin re-scrubbing names next Wednesday. He'll begin with a smaller test set of names that Brad will send. Re-scrubbing all the BIEN taxon names will take about a week, so we want to make sure things are running correctly before launching the whole enchilada.