APWEB2 Project Charter October 27, 2009

Project Title

Angiosperm Phylogeny Website: A Virtual Resource for Research and Education

Start Date

Jan 1, 2010

End Date

June 30, 2011

Project Justification

The Angiosperm Phylogeny Website http://www.mobot.org/MOBOT/research/APweb provides comprehensive and up-to-date information on seed plant evolutionary relationships and characters, a one-stop shop for information on how plants are classified, related evolutionarily, and their morphological, chemical, etc. characteristics. It includes information on the phylogeny, evolution and diversification of all 413 families and 58 orders of angiosperms as well as eleven families and four orders of gymnosperms. It provides researchers in systematics, ecology, and other fields convenient access to current and past literature on their interest group, it provides a novel synthesis of ideas, and is an educational tool for teachers and students alike. The site is designed to help in research and teaching seed plant phylogeny at a time when our knowledge of the major clades of seed plants and the relationships within and between them are still somewhat in a state of flux. Importantly, it helps biologists across disciplines communicate by being readily accessible and frequently updated; as phylogenies become clarified and new findings are made in anatomy, morphology, etc., they can be rapidly integrated into the Angiosperm Phylogeny Group system, a largely stable naming system.

The site includes a series of pages, each characterizing a specific order and the associated families, together encompassing all extant angiosperms and gymnosperms. All well supported clades, families, and above are included, as well as major clades within the larger families. It includes an extensive glossary – actually, a semi ontology, discussion on the characters, bibliography, etc. There are hundreds of mainly original species distribution maps, links to images, etc.

However, the site needs to be improved, and there are three immediate and linked goals to make these improvements:

The information needs to be archived in a more structured fashion to make it readily available to users ranging from expert researchers through students to casual browsers.
Tools to display a complete genus-level synonymy for flowering plants need to be developed. This would have important implications for botany as a whole, as well as allowing researchers to produce lists of numbers of species and genera for all nodes in the tree.
Finally, new informatics tools are needed to visualize evolutionary change more dynamically, to see the distributions of characters across all angiosperms, how they change, and the like.

Project Objectives

The primary objective is to rationalize the way data are organized and stored on APWEB to create a dynamic and extensible framework for integration and access to APWEB data for both internal and external applications. This will allow the creation of different kinds of user interfaces to the data depending on the needs of the particular end user. For example, a teaching interface, a browsing interface, a structured query-based interface, a web-services interface, etc.
A related objective is to improve APWEB's data visualization capabilities, for example dynamic representation of phylogenetic trees and vectorized, annotated of map information to facilitate both visualization and geographic queries.

Overview of Deliverables

Atomization and restructuring of existing data to increase automated parsing; designing a suitable schema and migrating the data to a database; A foundational infrastructure for query-based access to information in the database.
A web-based interface that assembles the data into a web site similar in appearance and functionality to APWEB, but with enhanced integration and visualization tools to improve APWEB's value as a virtual teaching tool.
A flexible web services framework to facilitate interaction with Phylomatic and other resources (e.g., external databases of characters/traits) and also to offer APWEB data to external applications.
Support for both internally (via a web site) externally (via a web-serviced layer) generated, structured queries to support research questions, such as relating trait data to the reference phylogeny or user-supplied trees.
Digitization/vectorization of existing global distribution maps to improve display and query capabilities.
A Genus-level synonomy for APWEB (possibility of synergy with iPToL data integration working group).
Tools for ongoing, sustainable curation and updates to the APWEB database.

Approach

Rationalization the existing data using consistent tags, headers, etc in the HTML to facilite; Atomize information into consistent, logical units, decoupling data from the existing "order page" framework. Replace bitmapped distribution maps with vector-based maps.
Design and implement a database schema, develop Perl-based database loading and query infrastructure, where possible using existing open source infrastructure components.
Develop a Perl/CGI-based web interface for the APWEB2 web site that will sit atop the new database.
Develop a generic RESTful web services interface, where possible using existing standards and infrastructure components.

Success Criteria

Achievement of technical goals: A robust database that replicated all data currently available via APWEB and also supports a web-services layer.
The ability to use APWEB's accumlauted information resources in new and value-added ways.
Adoption of APWEB's teaching and data resources by the current and new end user communities

Key Assumptions

New information will continues to be added and integrated into the enhanced APWEB database
Tools for managing the enhancements, tracking progress and meeting objectives will be required
Data in their current state will be pre-processed into a useful format for parsing

Resources

A unix/linux testing environment and development web server (iPlant)
A production database and web server for the final APWEB2 implementation (UMSL)
A UMSL staff member/domain specialist capable of rationalizing/processing data, iPlant expertise in database design and implementation.
Note that iPlant should receive credit for this effort, and mechanisms for acknowledgement, hosting, linkage to iPlant's DE etc need to be established and agreed to.

Roles and Responsibilities

Role	People
APWEB2 Coordinator	Peter Stevens
APWEB2 Collaborators	Campbell Webb, Amy Zanne
iPlant/iPTOL Coordinator	Sheldon McKay
iPlant/iPTOL Project Manager	Michael Gonzales
Plant Developer	TBA
UMSL Domain specialist	TBA

Signatures

The following people agree that the above information is accurate:
Project team members: Project sponsor and/or authorizing manager(s):

Notes/Comments

There is a strong preference to choose simple and robust database and web services infrastructure components so that APWEB2 can be maintained after the collaboration with iPlant ends.
UMSL staff member 0.5 FTE/yr; possible summer or travel support for PIs Webb and Zanne.

Risks

TBD