To Do - Brad BIEN db

Task	Status	Completed
Cultivated species (target mid-May)
develop algorithm	complete 4/25	4/25
locate data sources (Brad completed US, Canada, Peru, Bolivia, Ecuador; ongoing by Bob and Brian)	complete by Brad 4/25, but ongoing by others	4/25
instruct Aaron on how to put this "native status resolver (NRS)" into the BIEN db workflow
Taxonomic synonyms
acquire additional name sources	complete	3/27
develop improved algorithm for resolving conflicts among sources and flagging problem names	complete	4/3
incorporate into TNRS	complete	4/18
document how to add a new taxonomic source
provide instructions to Aaron about how to rerun names through the new TNRS	complete	5/13
Analytical tables (target end of April)
specify	in progress (view_full_occurrence done)
write queries (shifted Aaron instead of Brad)
provide sample table to iPlant (from BIEN 2)	complete	3/19
Document where information with detailed instructions for Aaron is located

2014.04.03

1) TNRS. I have finished building and validating the new TNRS database, including The Plant List, and have made related changes to the TNRS application scripts. At this stage all that is required before moving to production is to test the complete database + application on a development server. In the past, this has been a half day task, but unfortunately iPlant no longer has a functional development installation of the TNRS, and there is no one available to set one up. Consequently I am doing this myself from scratch, including installing all the software dependencies. My prior training as a botanist did not prepare me fully for unix system administration so I am having to learn as I go.

2) Cultivated species. I continue to add sources to the reference database when I am not configuring tomcat, java, ruby, etc on the TNRS development server. All sources are standardized to a common staging schema before normalization; most of the work involved in loading a new source is therefore mapping, parsing and loading to the staging schema. Once staged the sources are confederated using a single normalization workflow recycled from the TNRS. You can review the sources here. I am nearly at the end of my list of sources and will soon begin pestering you all aggressively for more.

3) Analytical tables. Not yet started.

Brad

2014.03.27

1. TNRS

The new TNRS database, including The Plant List, is ready and validated. Validation took longer than expected. During validation, I discovered some content in the current (online) version of the TNRS that was missing from the new build. After some digging through my scripts and notes I rediscovered a bunch of scripts and content that I had used to build the most recent production release of the TNRS, but hadn’t rolled into the TNRS pipeline. As I recall, it was a last-minute fix prior to the last BIEN meeting. The “extra” scripts were Tropicos-specific fixes that (1) add additional accepted names pillaged by Peter querying Tropicos from behind their firewall, and (2) an algorithmic correction to ComputedAcceptance that allows child taxa of synonym genera to inherit the correct genus. I have now run these scripts on the new database and added them to the TNRS pipeline.

The remaining steps are (a) commit the revised scripts and content to GitHub, (b) launch and test a development instance of the TNRS using the new database, (c) move the development instance to production, and (d) add content about The Plant List to the TNRS website. In theory these last steps shouldn’t take more than a day, but I’m a bit challenged by having to learn command line git from scratch. Previously I have only used svn from a gui. Sorry to be so slow. I’m just a botanist, remember?

2. Cultivated species

No progress since last week, when I added a bunch of new sources. I’ll get back to this as soon as I complete the new release of the TNRS. To view the sources currently being used, see the Cultivated Species development wiki. As you can see, we need more sources.

3. Analytical tables

We’ve reached the point where Nirav and others at iPlant will have begun to work on the middle layer of services that will be used to query BIEN. They will need a stable schema representing the majority of analytical tables we will be using. I’m going to start defining those schemas now, rather than wait for the core database to be completed.

Brad