TR_01FEB10

Call-in/WebEx Information

  • This meeting will have visual meeting - TR_WebEx

Action items

  • Andy sent out Call-in information for Core Software WebEx demonstrating the prototype Discovery Environment and the current functionality in the production Discovery Environment.
  • In-person meetings between Andy and Todd have been scheduled at NESCent for Feb. 9-11.
  • Sheldon will also attend at NESCent Feb 9-10.

Agenda

1. Brief review, any outstanding action items missed (5 min.)

  • Sheldon regrets that he will be absent from today's call. The low-level treeBest pipeline is behind schedule but will be ready for NESCent visit. Sheldon will also assist Andy with any advance preparation needed prior to the NESCent visit.

2. Requirements Discussion (remainder, ~50 min.)

Requirements Discussion
  • Understand how Tree Reconciliation research done today (what is not possible, shortcomings, annoyances) - the current workflow [*]

[*] We will continue on the topic of walking through how research is currently done. We left off at the point on Monday, January 25 discussing the representation of BLAST hits within a cluster or family.

The detailed walkthrough is focusing on the concrete version of the example workflow provided by Todd.

Notes

Attendees: Todd Vision, Cecile Ane, Jim Leebens-Mack, Nicole Hopkins, Andrew Lenards

The call from January 25, 2010 ends with the topic of representing BLAST hits. We began there before moving onto new topics:

  • Jim said that Phytome's approach to representing BLAST hits was fine. Todd indicated that more metadata being included w/ the hits would be useful (like identifiers, full species names, NCBI/GenBank identifiers).

Moving on:

  • [User Action] ...Focus is on Family #139, where does Andy go from here?
    • How do we decide what to select? A user would pick subfamilies based on their knowledge of the subfamily (via the alias/species metadata). You should be able to show all subfamilies together in database - subfamilies related to each other (aka "tribes") are broken down monophyletically in Phytome. In this list, they all have some relationship (being members of the subfamily) so can select any of them.
    • [REVISIT]/[ASIDE] It was suggested that if the entire subfamily was not select (so particular members were checked in the checkbox interface, but not all) we might want to use some visual representation of this. In other words: how do we represent subfamilies or families not checked by the user? [open question]
      • {return to workflow} user, Andy, could check multiple subfamilies - in the Phytome Family results page, the subfamilies are grouped in tables with a subfamily name appearing as a header in between results and allowing for the selections of an entire subfamily. Each checkbox representing an individual gene which is a member in the subfamily. A user could select individual genes from the results. A user could select an entire subfamily and, say, one gene from another subfamily to use as an outgroup (one hypothetical usage pattern briefly discussed - brought up by Andy so may not be scientifically plausible).
  • [User Action] our user, Andy, selects Subfamily #11 from Family #139 and clicks the Submit button at the top of the subfamily results.
  • [User Action] our user, Andy, could do several things at this point:
    • [User Choice] he could download and view the gene tree (in newick) available here (indicated to be a "dead end") {basic user}
    • [User Choice] he could download the gene tree (in newick) and create/provide species tree and open in Notung {more savvy user}
    • [User Choice] he could download raw sequence data, perform alignment, generate gene tree then continue on {advanced user}
      • With TreeBeST, you would get the alignment and would have to have a know species trees
    • Phytome does not give the user a species tree.
  • [REVISIT] The real "value add" of a Discovery Environment would be the availability of a BIG consensus tree
    • Otherwise...
      • a user would need to make use of the NCBI taxonomic tree
      • or, compile from the literature a species tree for the groups they're interested in
    • An interim large tree needs to be made available within the Discovery Environment until the BigTrees group has the large/big species tree inferred.
      • So uploading your own species tree is not truly an option for the users Tree Reconciliation functionality will be targeting.
    • [REVISIT] {Question} What is needed on "the big species tree" as far as annotations?
      • Follow-on/related... There needs to be an association between species names and gene names
        • Notung does the mapping lexically, meaning that the gene names include as a substring the name of the species
      • The Discovery Environment will need to do this association "behind the scenes" so that it is not a burden to the user - nor is it as obvious.
      • In the data model for genes, there needs to be a species field (which should not be the name, but an NCBI identifier, or metadata field in GenBank) and needs to be same as the species name in the species tree.
    • This audience (expected users of Tree Reconciliation Perspective in Discovery Environment) expects to recover the NCBI Taxonomy
      • Note/Issue: Everything that has a sequence will be in NCBI, but lack of synonymy is an issue.
        • Also, the hierarchy used is controversial
    • [User Action] our user, Andy, gets the gene tree (in newick) (which is the User Choice of the "basic user")
    • [User Action] Andy does some work to figure out a way to map gene names to species names (use the Notung substring approach since that is easy).
    • [User Action] Using Notung, open the gene tree and open the species tree then run the reconciliation.
      • Notung does an ATV-style (aka Archaeopteryx visualization with the duplication/loss events on the species tree are colored.
      • PrimeTV does reconciliation in the 'fat tree' visualization.
    • [Moderate-Tangent] Andy used about tree reconciliation modules in Mesquite
      • Cecile thought there might be some modules for analysis of reconciling "deep coalescence" (aka lineage sorting). Phase 1 of the prototype will focus on gene duplication and loss - with some potential functionality covering horizontal transfer. Deep coalescence is out of scope for now.
      • With 1KP pilot data - group is more concerned w/ duplication, loss, and horizontal transfer
        • Todd indicate that analysis of horizontal transfer was not a solved problem, a collaboration with Jens Lagergren's Bayesian analysis group might allow a "mixed model" approach to be applied.
    • TreeBeST differes from Notung (most important to the prototype, it is a command-line tool which can be incorporated into a pipeline - which is what Sheldon is working on)
      • TreeBeST: input an alignment to it plus a species tree and then you choose a tool to visualize with
    • [User Action] our user, Andy, would find that the ATV-style reconciled tree would be insufficient to make the conclusions discussed in the example workflow. A better way to come up with such conclusions would be with a 'fat tree' visualization of duplication/loss events.
    • Re the example workflow, from Step 5 on - there is no existing tooling
    • [Desired User Action] our user would:
      • select the tips in a reconciled tree (in 'fat tree' style)
      • the future-implemented visualization tool would highlight the Most Recent Common Ancestor (MRCA) in the tree
      • what our user (in the example) wants to know is ancestral species where subfamily had a single common ancestor
      • what branch of species tree is highlighted toward the selected tips will answer that question
        • Visualization allows her to see the duplications & losses
        • the absence of duplications & losses says that is an orthology here
      • So we want to do this across the whole subfamily - and currently ATV (Archeaopteryx) only does this from one gene
      • [DiscoveryEnvironment-Perspectives] The user could select a node in the gene species tree and link to the Trait Evolution Perspective in the Discovery Environment where the user could then provide the necessary data to do a Phylogenetic Independent Contract.