TE08MAR10

Agenda

1. Discussion about results from retrospective meeting (if it is not cancelled).
2. Output fromTreeviz meeting
3. Discussion about software to do ancestral state reconstruction (Mesquite, BayesTraits, Brownie, etc)

Participants

Joe Felsenstein, Natalie Henriques, Liya Wang, Brian O'Meara

Action items

Minutes (links to action items in bold)

iPlant release scheduled for early April.

Links to treeviz meeting documents, esp. wish list and metadata . Mentioned need to pass colors or other info from reconstruction back to TreeViz. Liya mentioned that rather than color, we'd be passing numbers (i.e., 0 or 1 on edge) and another (NEXUS) block to match numbers with color instead of attaching color to each node. Joe mentioned phyloXML as one way to pass complex info back.

Discussion of software for ancestral state reconstruction. Mesquite lacked documentation about how to run headless version for this (a post by Liya to the mailing list hasn't been answered yet). It's also Java, and TACC can easily wrap C/C++ into MPI than Java. BayesTraits was very slow and is closed source. [Joe mentioned that Phylip, including Contrast, is not open source, in that you can use it freely and modify it, but commercial use requires permission -- but iPlant's use of it is noncommercial and not a problem]. Brownie was slow: on a 50K-taxon tree, it took 20 hours on Liya's computer (1.8G core 2 duo intel cpu and 2 Gb memory) to do a discrete trait reconstruction. However, parallelizing it should be possible (it goes from 15 (by default) independent random starts, so these can be run in parallel (now it uses single core), though there is variation in how long each start took to run (it uses a Nelder-Meade simplex for numerical optimization).

Some discussion of algorithms for doing this sort of analysis (discrete character reconstruction). Basic summary:

Overall approach is to propose a set of parameter values (i.e., gain-loss transition rates), calculate likelihood of the data on the tree given these parameter values, try a new set of values, and so forth until an optimal set of values is reached (this is the step that could be parallelized above). Then, use these values to do a joint or marginal estimate of best states at each node, possibly also relative evidence for different states (joint = best set of states on all nodes at once, marginal = best state at each node, integrating over all possible states at other nodes). Calculating the likelihood uses a dynamic programming algorithm from Joe that efficiently calculates this. For estimating the states, Pupko et al have an efficient algorithm for joint estimates, and Koshi & Goldstein have a fast algorithm for marginal estimates. Some discussion on which type of estimates to provide (Brownie does joint), not clear that most users would have preferences, and so perhaps do whatever is easier to implement. There was discussion of exactly how optimization is carried out in Brownie or other programs to make sure that they are using sensible algorithms (rather than trying all values at each node, for example). Joe mentioned that it might be possible to do fast algorithms that take information from estimates elsewhere in the tree (in the marginal case, especially). Brian will look into fast algorithms for getting estimates of weight for different states (original Pupko et al. algorithm gives the single best state only).