DI_10JUL09

Attending: Sheldon, Karla, Damian, Val, Tina-notes

1) preamble and purpose of the meeting (5 mins; Sheldon)

The purpose of this meeting to get some more detailed guidance from Val on what iPlant needs to move forward in the short term for this working group. Put Val and Damian on the same call to drill down on needs and get going. Within next couple weeks – need to generate two key docs for each WG: 1) WG charter; 2) detailed project plan.

2) quick update on post-tree analysis working groups (10 mins)

Trait evolution (Sheldon): Formerly known as Ancestral Character States headed by Brian O’Meara. DE providing a portal for algorithms of traits onto established rates and the computation needed to do that. Integration and scale-up optimization challenges. Charter and project plan still being worked on, but this is most advanced of the WGs.

Tree reconciliation (Karla): currently led by Todd V, reconcile gene species tree, host parasite coevolution, reconciling other classes of data, technical vision of this WG is not as well developed but it’s also moving forward. Want to accelerate of this group to keep pace with Trait Evo WG. Draft charter and plan to be completed in 2 wks or so.

3) few words about what we need to do to move forward with the data

integration group (5 mins; Damian) – mostly chat about well understood and proven ways to move forward on these problems. Ideally: drill down to use cases and analysis of those, to get to requirements, then to design, to technology phase, then code. Don’t spend months on each, we want to be agile and take an iterative approach. Need to delegate work; b/c data integration is extremely complex.

Need to start engaging with TEWG and TRWG to get clearer sense of what they are doing in detail. Then work with them on requirements and to extract out of them what needs to be done and how, what are the identified data sources, could take a month or two to do this. This would focus us productively for the summer. When that’s well delineated, then in position to talk about design within and outside of the enterprise. Look at technologies needed to deploy that design.

Val subscribes 100% to everything Damian said, he wants to emphasize need for agility here!

Q: who participated with Trait Reconstruction group? Damian sat in on the telecoms only, no one-on-one meetings. What are their data needs and the nature of their discussions on what they want? Is there a plan for those who were not in that meeting to get access to meeting minutes? Sheldon – meeting notes need to be consolidated in one place, so action item: Sheldon and Karla will make all meeting notes now available on Alfresco to all parties (all notes are in Alfresco and part of the Phylogenetics GCT site).

Val: would like to understand where Brian is on the subject and what he wants, as he wasn’t part of the original proposal team. Sheldon/Damian – Brian’s notes from his talk are greatly informative and would be beneficial to access them; have some sample trees from him.

Damian – can see the strength of individual expertise, positive due to the levels of expertise, but it’s also a negative that it will take active engagement of the experts to translate the data sets, requirements, to design a system. We need to be proactive in going to the other WGs and getting info from them to give to the engineers.

Val – must reconcile data from multiple sources, so one thing we can discuss now and make part of the process is concern for data quality. E.g., it’s critical for data quality, it’s the precision of the information used to correlate the trees with the trait info we want to map on the tree (taxa on leaves on the tree). There are 3 crosscutting issues: data integration, taxonomic intelligence (Sheldon talking to Bill P about that; needs to be a major priority; urges to keep that in minds of the developers as they go along). Need incremental approach organized around various data sets, need to spend time cataloging certain data sets, some identified, others maybe identified by TE and TR WGs. This is a delicate issue because we have to negotiate with WGs on priorities and can’t do everything in one shot, learn from experience how to handle data sets.

Sheldon – agree about ramping up and carving things into do-able chunks; we may have to have input on how speedy and easy to integrate data, we don’t want to drive biologists solely on infrastructure people’s needs, it’s a critical consideration though.

Sheldon - Did discuss with Bill P on tax intel; here’s an overview: it’s truly crosscutting issue, do have immediate needs with only partial solutions available at the moment; have short term needs that may require an ad hoc solution; range of possible approaches to be taken are implementing a solution from the ground up and draws in source data, to the other extreme – finding a solution that is available and embracing it and doing minimal database tables to use that solution. Sheldon inclined to use second approach given that this problem is going to be solved in a grander way by other groups.

Val – fully agree to use second approach; need flexible approach to take advantage of existing mapping, however flawed, use approach where we can use tax intel from various sources as people develop better solutions. So focus our efforts on the optimum way to use existing solutions, including multiple solutions, and what we do with this tax intel; urge that we design some form of data cleaning, can’t avoid it. Propose several stages of tax intel; consider informing users of mismatches and allowing manual reconciliation on smaller sets of data. Build in to the tools the ability to clean data locally for particular application, the cleaned datasets may not get back to original sources. Problem is orthogonal to what Bill’s talking about; we will use what we can the best we can, but annoying mismatches could be corrected.

Damian – hard to model user scenarios at this point, but this WG can position itself at earlier part of the project by proactively drilling in for requirements. One wants to associate requirements with failure and success points so that one can take an informed, iterative process on assessing the cost of either implementing or rejecting any given requirement or feature. This drives prioritization, which drives organization. Resources best spent by concentrating identifying, delineating, and prioritizing on what is required, then we’re in a stronger position to talk about design.

Sheldon - There’s a lot of follow up after this meeting (get Val and Bill other docs) and have some technical discussions via email between calls; no plan to put a postdoc in this working group. Need from Val a scientific summary of the WG and where it’s going, and a list of individuals that might have interest in participating in this endeavor.

Val – Rutger Vos may be interested, but looking for hard core DI/CS communities: has 2 people’s names but hasn’t talked to them about concrete commitments. Action Item: Sheldon will send Val example produced by Brian and Sheldon as a template.

4) Short term tasks and priorities (20 mins; Val)

5) Discussion (20 minutes; all)

Karla – nothing else need to cover for the time being.

Sheldon – this group will reconvene when Bill gets back, early August, go through strategy for tax intel will be.

Sheldon will send Val example produced by Brian and Sheldon as a template.

Val – will send Damian email on tax intel thoughts after he reads Brian’s notes.

End: 11:50 PM

Action Items:

Sheldon and Karla will make all meeting notes, including Brian’s notes, now available on Alfresco,to all parties.

Sheldon will send Val example produced by Brian and Sheldon as a template.

Val – will send Damian email on tax intel thoughts after he reads Brian’s notes.