BIEN_04Apr10

BIEN/iPToL Meeting

Toward a Taxonomic Name Resolution Service

Missouri Botanical Garden
St. Louis, MO
March 31-April 2, 2010

Meeting Overview

The past decade has seen an explosive growth of large biological databases aggregated from multiple sources. Continental and global-scale data warehouses and networks such as GBIF (http://www.gbif.org/ ), SpeciesLink (http://splink.cria.org.br/), and REMIB (http://www.conabio.gob.mx/remib_ingles/doctos/remib_ing.html) provide access to millions of records from biological collections worldwide. Thousands of ecological inventories and species trait measurement are now available through portals such as VegBank (www.vegbank.org/), SALVIAS (www.salvias.net) and TraitNet (www.columbia.edu/cu/traitnet/). Electronic archives such as GenBank (www.ncbi.nlm.nih.gov/Genbank/) and TreeBase (www.treebase.org/) house millions of records of sequence data and phylogenies for hundreds of thousands of organisms. Such "mega datasets" represent a major new tool for the study of biodiversity, and have made possible analyses at spatial and temporal scales unimaginable even a decade ago (e.g., Loarie et al. 2008, Weiser et al. 2007, García 2006, Peterson et al. 2002).

Unfortunately, the increasing use of mega-datasets has highlighted a long-standing impediment within the biological sciences: taxonomic uncertainty. Do two different names represent two species or one? Does the same name used in different data sets at different times refer to the same species? Given the tools currently available, accurately answering such questions far a large biological database, with hundreds or even thousands of taxon name strings, can be daunting, time-consuming and error-prone.

The purpose of this meeting is to discuss the development of a botanical Taxonomic Name Resolution Service (TNRS) at the Missouri Botanical Garden, in collaboration with the International Plant Names Index (IPNI) and iPlant. A TNRS is a suite of applications and associated data for automated and computer-assisted correction and standardization of taxonomic names. Specific actions include correcting misspellings and standardizing spelling variants in names and authors, updating synonyms to accepted names, and mapping concept relationships among sets of taxa. While much of the information needed to perform these tasks is already available electronically via sources such as TROPICOS (www.tropicos.org) and IPNI (www.ipni.org), using such information currently requires time-consuming, case-by-case inspection. Our goal is to facilitate to taxonomic standardization of very large biological datasets through machine-to-machine transfer of taxonomic information and maximally-automated error detection and correction.

Meeting Schedule

Wed. March 31

Out-of-town participants arrive. Transfer to accommodations at Trelease House

Thursday April 1

Session 1: Why a TNRS?
8:00 am – 10:15 am
Moderator: Peter Jorgensen

Session overview. General introduction and overview of the goals and structure of the meeting. What is the Taxonomic impediment? What is a Taxonomic Name Resolution Service (TNRS), who needs it, and why? Presentations by participants summarizing the challenge of taxonomy with respect to their own research. The goal of this session is to produce a broad justification of the need for a TNRS, including perceived needs within the larger community. Presentations should be roughly 15 minutes, allowing 5 minutes for questions.

8:00 - 8:10 Welcome and introduction (Peter Jorgensen, Missouri Botanical Garden)
8:10 - 8:35 Brad Boyle (BIEN/University of Arizona) – BIEN, SALVIAS and the case for a TNRS: New insights from large biodiversity datasets
8:35 – 9:00 Peter Jorgensen (MO/BIEN) – The Madidi Project: Ecological inventories and the challenge of taxonomy
9:00 – 9:25 Amy Zanne (UMSL/APWeb) – Taxonomy and large-scale ecological analyses
9:25 - 9:50 Bill Piel (iPlant/iptol) – TBA
9:50 – 10:15Peter Stevens (MO/APWeb/WoW) – Angiosperm Phylogeny Web and the Generic Synonymy initiative

10:15 – 10:30 Break

Session 2 - Lessons learned
10:20 am – 12 noon
Moderator: Bob Peet

Session overview. What has been done in the past to address the taxonomic impediment? What solutions are currently available for resolving nomenclature, synonomy, and taxonomic concepts (both data and cyberinfrastructure)? How are these solutions being made available, or not, to large biodiversity datasets? What has worked and what hasn't worked.

10:30 – 11:10TROPICOS: What it is, what it is not; how Tropicos in the past has solved problems with name matching; what Tropicos can provide now; what is needed with regard to cyberinfrastructure but also population of the database to provide more refined solutions in the future (Bob Magill/Chris Freeland; MO)
11:10 – 11:35 Other solutions and initiatives: TDWG, IRMNG, UBio, SALVIAS, CRIA/SpeciesLink, others (Brad Boyle)
11:35 – 12 Taxonomic concepts (Bob Peet, UNC/BIEN)

12 noon – 1 pm: Lunch

Session 3 – Defining a solution
1:00 pm – 5:00 pm
Moderator: Brad Boyle (or…Matt Jones?)

Session overview. What is needed? Define data and cyberinfracstructure needed to solve the challenges outlined in Session 1. Identify short-term and long-term goals. Unlike previous two sessions, this session will consist of a series of discussions structured around key topics. We may form breakaway discussion groups as needed.

1 pm – 3 pm Discussion topics:

  • Name matching
  • Synonymy
  • Taxonomic concepts
  • Data needs, especially capture of additional monographic and checklist data
  • Additional institutions and data sources (IPNI, etc.)
  • Data access and intellectual property concerns

3 pm – 3:20 pm: Break

3:20 – 4 pmDiscussion topics:

  • Integration with existing initiatives (esp. TDWG)
  • Scalability and interoperability
    • GUIDs, markup
    • Architecture, web services, etc.
      4:00 – 4:45Additional topics & breakaway sessions.
      4:45 – 5:00Summarize, prioritize: long-term versus short-term goals

6:30 pm Dinner and libations, location TBA

Friday April 2

Session 4 – Implementation
8:00 am – 12 noonModerator: Matt Jones

Session overview. Define specific deliverables based on list of goals from Session 3. Estimate needed personnel and infrastructure, what they will costs and identify potential sources of funding. Identify potential partner and collaborators. Draft a potential timeline and assign responsibilities.
8:00 – 8:30 Review of goals from previous Session 3
8:30 – 10:00Requirements for implementation

  • Collaboration with existing initiatives
  • What is needed in terms of funding, personnel, hardware
  • Estimated costs
  • Potential sources of funding

10:00 – 10:15 Break

10:20 – 12 Draft plan of action items

  • Short and long-term goals.
  • List of deliverables & responsibilities.
  • Timeline

12 noon – 1 pm: Lunch

Session 5 - Wrap up
1:00 – 2:00 pm
Moderator: Brad Boyle

Session overview. Summarize conclusions of meeting. Agree on immediate next steps.

1 pm – 2:30 pm Final discussion of near-term goals plus plans for additional meetings, if any. Summary and closing remarks.

2:30 pm Meeting over. Out-of-town participants depart St. Louis

Literature cited:

García, A., 2006. Using ecological niche modelling to identify diversity hotspots for the herpetofauna of Pacific lowlands and adjacent interior valleys of Mexico. Biological Conservation, 130(1): 25-46.
Loarie, S.R. et al., 2008. Climate Change and the Future of California's Endemic Flora. PLoS ONE, 3(6): e2502.
Peterson, A.T. et al., 2002. Future projections for Mexican faunas under global climate change scenarios. Nature, 416: 626-629.
Weiser, M.D. et al., 2007. Latitudinal patterns of range size and species richness of New World woody plants. Global Ecology and Biogeography, 16: 679--688.