TE_Project Charter

Project Charter
August 5, 2009
Version 1.4

Project Title: Trait Evolution
Phase I Start Date: June 2009
Phase I End Date:  June 2011

Project Justification:

The iPlant Board of Directors-constituted Review Team met in early March 2009 and recommended immediate engagement with the iPlant Tree of Life (iPToL) Grand Challenge Project Team to initiate a two-year project constructing a Phylogenetics CyberInfrastructure.  In early May 2009, members from iPlant’s Executive Team and Engagement Team met with leads from the iPToL Grand Challenge Team to develop a management plan and create a roadmap for work to be conducted over the next two years.  Collaborative implementation was organized into working groups with focused development goals. The four main working groups are: Big Trees, Data Assembly, Tree Reconciliation, and Ancestral Character State Reconstruction.  Two crosscutting working groups to develop shared data and compute infrastructure are Data Integration and Visualization.

Phylogenetic trees provide information about organism relatedness, which can help inform discussions about history of life and taxonomy. Perhaps more importantly, these trees can be used to understand how organisms have evolved through time. Which traits affect diversification, biogeographic patterns, evolution in response to past climate change, co-evolution of pollinators and flowers or hosts and parasites, the influence of genetic architecture on morphological evolution, patterns of community assembly and interaction, and far more can be addressed by relating phylogenetic trees to the evolution of specific traits.  There are many methods and software programs that are used by a variety of life science disciplines to map traits onto trees. However, with the unprecedented increase in available sequence and phylogenetic data, these programs may not scale well.  In some cases, the programs were written for trees with less than a thousand taxa and do not handle memory management, optimization for speed, or other aspects of program design very well for the much bigger trees that are coming online.   Even well designed programs begin to get bogged down to the point where real-time use is no longer feasible for many users.   There is a real risk that once the difficult job of creating a 500K taxon tree is completed by a few specialists, it will be put to only limited use by the far greater community of consumers due to an absent or underdeveloped infrastructure for dealing with such a large tree.  The work done by this group in developing an infrastructure for downstream analysis of large trees is essential to maximizing the amount that can be learned about plant biology using phylogenetics and to capitalizing on the extensive work being done to optimize large-scale phylogeny reconstruction.

Project Objectives:

The iPlant Tree of Life (iPToL) project addresses the construction of the green plant tree of life to aid in the understanding of the diversification of green plants over the last billion years.  This project will also build a cyberinfrastructure to connect this tree to the rest of the plant sciences community and beyond. Cyberinfrastructure will be built to support this endeavor along with providing support for post-tree analyses. 

Trait evolution, a post-tree analysis, provides the scientific community with the ability to makes inferences about processes happening both millions of years ago and as recently as HIV evolution. Incorporation of phylogenetic trees can be essential for correctly interpreting data gathered from multiple species, as these species are not independent and identically-distributed data points. [traitwg:more scientific justification is needed here]

Overview of Deliverables:

Final Deliverable: A web based environment that receives trees, either through user upload of their own data, selection of a pre-defined tree, or queries tree databases, analyzes trait data that can be supplied by the user or imported from trait databases, and reports the results.  The discovery environment will provide portals to other larger databases such as TreeBase and plant genome databases.

Priorities of methods to provide:

  • Independent contrasts
  • Continuous ancestral character state reconstruction (with confidence measures)
  • Discrete ancestral character state reconstruction (with confidence measures)
  • Correlation (Pagel 1994)
  • Various stretching processes (Blomberg K, various Pagel transforms, etc.)
  • Additional methods (discussion needed when nearing that point)

Major Year 1 Milestones:

  • Identify current limits of software (i.e. program A fails in B way with method C [traitwg:from prioritized list above] on a dataset with D taxa and E characters of F type
  • Discovery Environment that accepts trees  and data and does independent contrasts
  • Algorithms optimized to work on at least 50k taxa tree
  • Creation of a sham 500k taxa tree that includes data for two  correlated continuous characters and two  correlated discrete characters

Approach:

A working group will be formed to address both the short-term and long-term objectives above.  This group will be composed of Brian O’Meara (project champion),  and at least one postdoc or grad student (Jeremy?) identified by the iPToL PIs along with Sheldon McKay, Karla Gendler, and other members of the iPlant Phylogenetics Engagement Team. There will be frequent contact between iPToL members and iPlant in the form of scheduled weekly meetings and ad hoc contact (IRC/email, etc.).  Developers will be able to address questions to the biologists as they arise (e.g. need to deal with polytomies? Support for different data formats? etc.?).   Planning and development will be made as public as possible through the use of mailing lists, wikis, and discussion forums as ways of engaging the broader community and receiving feedback early in the design process.  
 

iPlant’s Engagement Team will work with the working group to gather requirements and prototype the solution to provide proof of concept, if needed.  These requirements and prototype(s) will be given to iPlant’s core developers for iterative prototyping of the solution(s).  The group will then work with the iPlant core developers to bring the software to production after their specifications identified in collaboration with iPToL scientists have been met.  Releases will be early and often to show progress on the project and gain support and user feedback.  

Sheldon McKay and Karla Gendler will give monthly status reports to the Steering Committee and will also report back any changes and/or additions that have been identified by the Steering Committee.

Success Criteria:

  • Creation of a web app that is actually used to do analyses quickly and easily
  • Serves trees and does contrasts
  • Useful for large scale analyses
  • Developed pathway/protocol  for addition  (through wrapping, recoding, etc.) of other methods to the discovery environment

Key Assumptions:

Broadly, iPlant is designed to build cyberinfrastructure and not generate new data. Thus, the Board of Director’s recommend that iPlant not focus on new algorithm development but instead on providing HPC and scale-up expertise in support of existing software.  The key is to be able to solve problems that need to be solved.  There is risk assumed in achieving these goals.  Best practices will be used to attack problems and if progress is not being made, there is the possibility of bringing the problem to the Scientific Opportunities Team to discuss the possibility of developing a proposal regarding algorithm development.
iPlant’s Engagement Team and developers will be the people with the most knowledge of the shared discovery environment.  It will be necessary that the work of the Trait Evolution working group complements that of the other working groups, yet remains independent enough that delay in the progress of one group does not dramatically affect the progress of other groups (i.e., doing independent contrasts on large trees will be useful to many researchers even before the 500,000 taxon plant tree is created).

Resources:

Two units of summer support ($10k per unit) are allocated for Brian O’Meara as project champion.  Two $50k fellowships are allocated for one postdoc and one graduate student. Funding for iPlant staff members, meetings, workshops, as well as EOT activities come from other sources and are not included in the working group budget.

Grand Challenge team members will have access to a scalable pool of reliable, enterprise class virtual servers for providing persistent web services, access to world-class high performance computing resources, and access to large scale, redundant storage systems with petascale capacity.  Below is a description of what is currently available to iPlant.  Note that these will change with time and needs.

  • Compute: Ranger, Lonestar, Stampede (UT/TeraGrid) Saguaro, Sonora (ASU) Marin, Ice (UA)
    • ~700 Teraflops, more computing power than existed in all the Top 500 computers in the world 4 years ago
  • Storage:  Corral, Ranch (UT), Ocotillo (ASU)
    • Well over 10 Petabytes of storage can be made available for the project, on scalable systems capable of growing much more.
  • Visualization:  Spur, Stallion (UT), Matinee (ASU), UA-Cave
    • Among the world’s largest visualization systems
  • Virtualized/Cloud Services:  iPlant (UA) and ASU virtual environments, vendor clouds
    • Positioned to cloud technologies to deliver persistent gateways and services to users

Roles and Responsibilities:

Both iPToL and iPlant will work together to establish an effective team consisting of iPlant personnel and appropriate super users/super postdocs to create use cases and specifications for objectives and deliverables.  
iPlant's organization chart can be found in Appendix A.  Appendix B and C contain role descriptions for the iPlant Engagement Team and iPToL team respectively.  A list of key personnel is attached as Appendix D.  The list is not comprehensive; please add names as appropriate.
Recruitment for postdoctoral candidates should commence immediately by the iPToL PIs; iPlant will ensure funds are in place as soon as possible.

Conflict of Interest Policy:

A conflict of interest policy is currently being developed by iPlant in collaboration with the National Science Foundation, the iPToL Grand Challenge Team, and the Genotype-to-Phenotype Grand Challenge Team.  In general, each participant should follow their own institution’s conflict of interest policy.
An example of the conflict of interest policy being developed, a CoI would exist if a GCT lead or member could benefit financially from a piece of software, such as via a spouse or relative who developed the software or worked for the company that developed and sold the software,

Signatures---The following people agree that the above information is accurate:

Project team members:Project sponsor and/or authorizing manager(s):Notes/Comments:

Appendix A: iPlant CI Development Team Organization Chart
 

Appendix B: iPlant Engagement Team Roles

Scientific lead (Sheldon McKay):

  • Interfaces with faculty and super users
  • Provides design input and scientific leadership to the engagement team
  • Reviews all deliverables
  • Holds regular status meetings
  • Provides regular status reports to Project Manager
  • Manages and resolves team-level risks, issues, and changes

Project Manager (Karla Gendler):

  • Aides Scientific Lead in supervising and providing technical direction to project team
  • Executes project management processes: risk, issues, change, quality, and document management
  • Ensures project plan and schedule; detects and manages variances
  • Provides weekly project status reports
  • Facilitates weekly team status meetings

Team Member:

  • Major activities they will do (defined at NESCent meeting and after)
  • Deliverables they will produce (defined at NESCent meeting and after)
  • Attends status meetings or other appropriate meetings
  • Participates in project management processes such as risk, issue, and document management

Human Interface Specialist (Brenton Elmore):

  • Brings user experience and usability methods to user interfaces and services
  • Helps define user requirements for the various iPlant community segments
  • Provides training and support to development teams in user-centered design and usability
  • Develops quantifiable metrics to evaluate iPlant's user interfaces

Appendix C: iPToL Team Roles

Project lead (Michael Sanderson):

  • Interfaces with iPlant Executive Team
  • Participates in Steering Committee Meetings
  • Reviews all deliverables
  • Oversees and manages all working groups
  • Point of contact for project

Steering Committee Member

  • Meet once a month via phone

Project Champion (Brain O'Meara)

  • Point of contact for project

Super User/Postdoc/Grad Student

Working Group Member

Appendix D: Personnel
(updated as the project progresses)

Name

Title

Role

Contact Information

Michael Sanderson              

Proposal principal leader

main contact

sanderm@email.arizona.edu

Michael Donoghue

Proposal principal leader; Plant science community leader

 

michael.donoghue@yale.edu

Pamela Soltis

Proposal principal leader; Plant science community leader

 

psoltis@flmnh.ufl.edu

Douglas Soltis

Proposal principal leader; Plant science community leader

 

dsoltis@botany.ufl.edu

Val Tannen

Proposal principal leader; Computational science community leader

 

val@cis.upenn.edu

Alexandros Stamatakis

Proposal principal leader; Computational science community leader

 

stamatak@cs.tum.edu

Todd Vision

Proposal principal leader; Computational science community leader

 

tjv@bio.unc.edu

Brian O'Meara

Trait Evolution Working Group

Project Champion

bomeara@utk.edu

Rich Jorgensen

iPlant Principal Investigator

 

raj@ag.arizona.edu

Steve Goff

iPlant Project Director

 

sgoff@iplantcolalborative.org

Dan Stanzione

iPlant Co-PI; Director of Cyberinfrastructre Developement

 

dan@tacc.utexas.edu

Martha Narro

iPlant Director of Education, Outreach, and Training

 

narro@email.arizona.edu

Sheldon McKay

iPlant Scientific Lead; iPToL Engagement Team

Scientific Lead

mckays@cshl.edu

Karla Gendler

iPlant Project Manager; iPToL Engagement Team

Project Manager

gendlerk@iplantcollaborative.org

Damian Gessler

iPlant Semantic Web Architect

 

dgessler@iplantcollaborative.org

Sonya Lowry

iPlant Lead Developer

 

sonya@iplantcollaborative.org

Brenton Elmore

iPlant Human Interface Specialst

 

brenton@iplantcollaborative.org

Edwin Skidmore

iPlant IT/Infrastructure Lead

 

edwin@iplantcollaborative.org

Appendix E: iPlant’s Programmatic Terms and Conditions

This is an excerpt from the cooperative agreement between the NSF and iPlant outlining NSF’s expectations of iPlant.
Program/Project Description: The goal of the program is to establish the iPlant Cyberinfrastructure Collaborative, taking into account the following considerations:
a)    The iPlant Collaborative will utilize new computer, computational science and cyberinfrastructure (CI) solutions to address an evolving array of grand challenge questions in plant science;
b)    The project will be community-driven, involving plant biologists, computer and information scientists and experts from other disciplines working in integrated teams to enable interdisciplinary systems-level scientific queries and analyses;
c)    The project will use community-based processes to select grand challenge questions, employing a multi-step process that includes a community-wide conference at which candidate questions are selected for subsequent feasibility, impact and needs assessment via “readiness symposia”;
d)    The project will develop community-driven, open-access digital Discovery Environments (DE) that are each focused on a grand challenge question through a selection process that includes evaluation of proposals from readiness symposia by a community Board of Directors (BoD);
e)    The DEs will be comprehensive CI systems constructed around a grand challenge question and designed to enable collaboration, information access and integration, computational capabilities, visualization and analysis, modeling and simulation, learning resources, community annotation and other forms of content creation;
f)    The DEs will comprise hardware, software, network infrastructure, connectivity, and the full range of appropriate science and technology expertise emphasizing Web 2.0 and web services approaches along with open-source, community development methods;
g)    The DEs, software tools and systems, novel data sets and the like developed under direct project funding, will be open source and will be made openly available for reuse and repurposing, with attribution;
h)    The research, education and outreach activities will be integrated fully into the project plan through involvement of students and educators in development of DEs;
i)    Education activities will include, but not be limited to i) teacher intern programs with an emphasis on minority recruitment, and built around standards-based teaching modules that use DEs for discovery-based learning, ii) traveling workshops, iii) iPlant Action Teams (IPATS) and iv) integrated active assessment and evaluation components;
j)    A designated Diversity Officer will ensure diversity of all levels of the iPlant Collaborative by: promoting diversity across the project team and advisory groups; engaging school districts serving minority and economically-disadvantaged populations in the educational development programs; performing outreach to a diverse range of professional societies, associations and academic institutions; and designing and implementing project educational activities to reach a diverse population;
k)    Social science activities will be integrated into the project through development by an independent evaluator of a continuing evaluation process with formative and summative phases, organization of regular social science planning workshops by the iPlant Collaborative, and cooperation with social scientists who may conduct their own studies of the iPlant project.