August 31, 2009

Trait Evolution Working Group Meeting

The purpose of this meeting is to introduce key project personel, the project charter. Clarification on the points below will be distilled into a problem statement for the iPlant developers and requirements analysts.

Agenda

Introductions

  • Sonya Lowry: Lead developer, iPlant core software team
  • Nicole Hopkins: Requirements and Quality Analyst, iPlant core software team

High-level problem statement

  • iPlant developers will convene on Sept 13, 2009.
  • Problem statements (high-level overview of what the trait evolution working group wants to do) will be analyzed by this group and will results in directed questions for the working group

Overview of Deliverables:

Final Deliverable: A web based environment that receives trees, either through user upload of their own data, selection of a pre-defined tree, or queries tree databases, analyzes trait data that can be supplied by the user or imported from trait databases, and reports the results. The discovery environment will provide portals to other larger databases such as TreeBase and plant genome databases.

User Personas

  • Please consult this document for information on the requirements below
  • Define the most common user personas for the target audience that will use this software/discovery environment (for examples, it would be good to reference Brenton's user experience document. I'm having MMS pull out the user personas and will post them here as a document for a frame of reference)
  • Map user personas to domain problems (create user scenarios) [NOTE: This has been started here ]
    When the specified user interacts with the system, what is wanted, what is expected, what is acceptable in time frame?

Review of Year 1 Milestones:

  • Identify current limits of software (i.e. program A fails in B way with method C from prioritized list above on a dataset with D taxa and E characters of F types
    Time line? Aug-Oct, 2009
  • Software optimized to work on at least 50k taxa tree
    Time line? Sept 2009 - March 2010
    **Starting with PHYLIP (CONTRAST) and mesquite (PDAP), what other software is targeted for early implementation. [Note (by O'Meara): we need methods implemented, not particular software. For example, we need to be able to do independent contrasts. If this can be done with phylocom software for enormous trees, we don't need to get phylip and mesquite working, too. The software programs listed are just potential ways to get methods implemented]
  • Discovery Environment that accepts trees and data and does independent contrasts
    Time line?
    • Early prototyping will be focused on interface and the initial implementation of independent contrasts functions
    • Addition of software components will proceed as required
      Prototyping Sept 2009 - March 2009
      Engineering and Implementation (iterative) Jan 2009 - end
  • Creation of a sham 500k taxa tree that includes data for two correlated continuous characters and two correlated discrete characters
    Time line? July-Sep, 2009
  • Define and prioritize the biological/technical problems that exist in this field (for example, the first is ability to independent contrasts on a 50k tree, perhaps using PDAP or CONTRASTS, what others?)

Priorities of methods to provide:

Meeting Notes

Notes Brian Omeara

  • Focus first on discrete traits: easier to get large discrete datasets than large continuous datasets.
  • The discovery environment needs to be written in such a way that users will think about whether results are trustworthy and whether the methods apply. If users can run an analysis, they will, even if it's wildly inappropriate, and they'll often just use the program defaults. So, have built-in automatic checks about model appropriateness, warnings about applicability of simple models to huge datasets, etc. For this reason, may want to investigate stretching models first.
  • Ways to deal with uncertainty in trees (topology and/or branch length) should be built-in (i.e., upload a set of bootstrap or Bayes trees and loop over them). If a tree is being served from the tree reconstruction group, it should have some assessment of uncertainty (such as the trees from the individual bootstrap replicates, not just numbers on edges). We should give some thought to uncertainty in trait data, as well (at least loading it, if not directly using it in the implemented methods).
  • Taxonomic intelligence (matching names as taxonomy changes, for example) is a hard problem being worked on by other groups. For now, require perfect match (except allow for automatic pruning when the taxa in the data set are a clean subset of the taxa in the tree?)
  • Also not early priority to develop ways to interact with trait databases.
  • Focus on implementing methods, not wrapping software (once a method has been implemented in one way in the discovery environment, that suffices).
  • Next thing needed from us is a high-level problem statement. I'll make a draft statement and send it around tomorrow for comments. If you want to work on "user personas", you can see the start for our working group here and examples of finished ones from a different iPlant project here pdf. It's probably worth checking to make sure that no large group of users is unrepresented in the ones already listed, but I don't think this is terribly urgent.

iPlant meeting notes