TE_Project Plan

Trait Evolution Working Group
Project Plan
June 22, 2009
Version 1.0

Background

The iPlant Board of Directors-constituted Review Team met in early March 2009 and recommended immediate engagement with the iPlant Tree of Life (iPToL) Grand Challenge Project Team to initiate a two-year project constructing a Phylogenetics CyberInfrastructure. In early May 2009, members from iPlant’s Executive Team and Engagement Team met with leads from the iPToL Grand Challenge Team to develop a management plan and create a roadmap for work to be conducted over the next two years. Collaborative implementation was organized into working groups with focused development goals. The four main working groups are: Big Trees, Data Assembly, Tree Reconciliation, and Ancestral Character State Reconstruction. Two crosscutting working groups to develop shared data and compute infrastructure are Data Integration and Visualization.

Phylogenetic trees provide information about organism relatedness, which can help inform discussions about history of life and taxonomy. Perhaps more importantly, these trees can be used to understand how organisms have evolved through time. Which traits affect diversification, biogeographic patterns, evolution in response to past climate change, co-evolution of pollinators and flowers or hosts and parasites, the influence of genetic architecture on morphological evolution, patterns of community assembly and interaction, and far more can be addressed by relating phylogenetic trees to the evolution of specific traits. There are many methods and software programs that are used by a variety of life science disciplines to map traits onto trees. However, with the unprecedented increase in available sequence and phylogenetic data, these programs may not scale well. In some cases, the programs were written for trees with less than a thousand taxa and do not handle memory management, optimization for speed, or other aspects of program design very well for the much bigger trees that are coming online. Even well designed programs begin to get bogged down to the point where real-time use is no longer feasible for many users. There is a real risk that once the difficult job of creating a 500K taxon tree is completed by a few specialists, it will be put to only limited use by the far greater community of consumers due to an absent or underdeveloped infrastructure for dealing with such a large tree. The work done by this group in developing an infrastructure for downstream analysis of large trees is essential to maximizing the amount that can be learned about plant biology using phylogenetics and to capitalizing on the extensive work being done to optimize large-scale phylogeny reconstruction.

Goals and Objectives

The iPlant Tree of Life (iPToL) project addresses the construction of the green plant tree of life to aid in the understanding of the diversification of green plants over the last billion years. This project will also build a cyberinfrastructure to connect this tree to the rest of the plant sciences community and beyond. Cyberinfrastructure will be built to support this endeavor along with providing support for post-tree analyses.

Trait evolution, a post-tree analysis, provides the scientific community with the ability to makes inferences about processes happening both millions of years ago and as recently as HIV evolution. Incorporation of phylogenetic trees can be essential for correctly interpreting data gathered from multiple species, as these species are not independent and identically-distributed data points. more scientific justification is needed here

Scope

In an initiative to build a Discovery Environment for iPlant and towards the objectives of the Phylogenetics Grand Challenge Project, a web based environment will be created. This environment will: 1) receive trees, either through user upload of their own data, selection of a pre-defined tree, or queries of tree databases; 2) analyze trait data that can be supplied by the user or imported from trait databases, and 3) report the results of analyses. The methods to be provided for analyizing trait data will include independent contrasts, continuous ancestral character state reconstruction (with confidence measures), discrete ancestral character state reconstruction (with confidence measures), correlation (Pagel 1994), and various stretching processes (Blomberg K, various Pagel transforms, etc.).

Deliverables

Work Breakdown Structure

Scheduled Milestones

Date Milestone
6/17/2009 Project Start
6/17/2009 Project Kick-off Meeting

See Project Schedule for key assumptions

Project Schedule

Key Assumptions:
1. Human resources will be available to work on these projects when needed.

Project Resources

Two units of summer support ($10k per unit) are allocated for Brian O’Meara as project champion. Two $50k fellowships are allocated for one postdoc and one graduate student. Funding for iPlant staff members, meetings, workshops, as well as EOT activities come from other sources and are not included in the working group budget.

Grand Challenge team members will have access to a scalable pool of reliable, enterprise class virtual servers for providing persistent web services, access to world-class high performance computing resources, and access to large scale, redundant storage systems with petascale capacity.

Project Success Criteria

• Creation of a web app that is actually used to do analyses quickly and easily
• Accepts trees and does contrasts
• Useful for large scale analyses
• Developed pathway/protocol for addition (through wrapping, recoding, etc.) of other methods to the discovery environment

Dependencies/Constraints/Assumptions

Broadly, iPlant is designed to build cyberinfrastructure and not generate new data. Thus, the Board of Director’s recommend that iPlant not focus on new algorithm development but instead on providing HPC and scale-up expertise in support of existing software. The key is to be able to solve problems that need to be solved. There is risk assumed in achieving these goals. Best practices will be used to attack problems and if progress is not being made, there is the possibility of bringing the problem to the Scientific Opportunities Team to discuss the possibility of developing a proposal regarding algorithm development.

iPlant’s Engagement Team and developers will be the people with the most knowledge of the shared discovery environment. It will be necessary that the work of the Ancestral Character State working group complements that of the other working groups, yet remains independent enough that delay in the progress of one group does not dramatically affect the progress of other groups (i.e., doing independent contrasts on large trees will be useful to many researchers even before the 100,000 taxon plant tree is created).

Risk Summary

• difficulty of use
• not being extensible in the future (need method to incorporate new methods easily)
• moderate success (i.e. only used a few times, how frequently is it used?);
• overcommitted