TE101005

Trait Evolution

Oct. 5, 2010

Invitees/Attendees (in orange)

Barb Banbury (bbanbury@utk.edu)
Jeremy Beaulieu (jeremy.beaulieu@yale.edu)
Joe Felsenstein (joe@gs.washington.edu)
Eric Lyons (elyons@iplantcollaborative)
Naim Matasci (nmatasci@iplantcollaborative.org)
Sheldon McKay (sheldon.mckay@gmail.com)
Brian O'Meara (omeara.brian@gmail.com)

Topics for discussion (updates in orange)

Tree visualization

Working with Nicole, I found some documents indicating the needs of TE with regard to visualization. It seems that edge coloring is the main priority: Metadata wish list and Brian also provided some examples of relevant trees: Cross cutting needs analysis. Upon Eric's suggestion, the group takes a look at the current state of the Phyloviewer. Brian confirms that edge (and nodes) coloring is the top priority, including the possibility to color edges with gradients and the coloring of the triangles (collapsed lineages). The color of the triangle should represent the proportion of lineages in which a certain trait is present, either by subdividing the the triangle or by averaging the colors. Naim reports that the visualization group is leaning towards supporting nexml and that he will work in contact with the Tree Visualization group on the domain model.

ape integration and timeline

The R statistical software package ape (Analyses of Phylogenetics and Evolution) offers a comprehensive suite of phylogenetic methods. A key point in favor of ape is the fact that ape can perform an estimation of uncertainty of ancestral state reconstructions. Naim has started testing the function and the possibility to efficiently run R code on the Condor cluster. Another candidate package for integration is geiger, which includes several methods for model fitting. Naim reports that he could successfully run ace on the Condor cluster and that given that the GUI and filetypes for ace are the same for contrasts he thinks he can have a working demo within 2 weeks. In addition, because ace can also be used to estimate discrete character states, he thinks he could be able to provide that functionality too. One aspect that has not been dealt with yet, is what form the output should have, and in particular the need to merge the tree with the internal node value estimates. Naim will look into what possible solutions are available. Brian and Barb also point out that people will use ace to reconstruct the ancestral character state, whereas if thy want to obtain the model parameters, they might prefer to use geiger. Another important part of the output are the errors and warnings produced by the scripts. Regarding the concurrent implementation of other software, Naim reports that AncML cannot be compiled on the Condor machines, and suggests to drop that software, especially given that the ace implementation is underway. Naim also informs the group that Joe is planning to revise PHYLIP and that he could use some help with coding from iPlant; Naim and Sheldon are looking into it.

geiger integration

Another priority identified by the working group is Model Fitting. The R package geiger offers this functionality and Naim will start look into it. Barb also points out that there are some additional general method in geiger that could be useful for tree/data matching (e.g. dropping taxa from a tree in case of missing data).

Methods' input validation

To minimize software errors after job submission, a validation step is necessary. The way it is currently designed, the validation only checks whether input values and filetypes are correct. However, a major source of errors is going to be the file content, rather than its format. In particular, a newick file might contain a correctly specified tree, but the tree might not be suitable for certain kind of analyses (e.g. rooted, additive,...). A compatibility table could be used in that case to ensure that the chosen method will accept the input provided. The group also think it would be useful to have such a chart. Naim will put it online.

Method list

I have received some suggestions of additional methods that could be integrated in the discovery environment. At present time the focus is on the ones already identified, but I think keeping track of the various possibilities could be useful in the longer term. A wiki page for such a list will be created. Also, Brian will invite Ann Stapleton, who provided the suggestions, to join the working group.

Taxa name matching

Barb reports that she had some issues with the interface that allow users to match taxa names in the datafile to those in the tree file, especially for large groups. She will use it more and collect ideas on how to make the process more efficient (e.g. automatic taxa dropping)

Open action items (see also TE jira page)

  • To investigate an optional “make trees available for internal testing” functionality when trees are uploaded for analysis. [Naim]

Completed items

Jira adoption

Naim set up a jira page to track, discuss, and collaboratively work on action items, issues and tasks. The project page is located at https://pods.iplantcollaborative.org/jira/browse/TRAILEVOL.