DA_06JUL09

Attending: Michael D, Pam, Doug, Karla, Nirav, Sheldon, Damian, Tina-notes

Start: 10:04 AM MST

Working Group Organization/Goals: This is the first organizational meeting of Data Assembly Working Group (DAWG); Doug is designated lead of the DAWG. Goal is to assemble people who will do analysis of the data. One possibility is to go to Genbank and download sequences, which is what Alexis is already doing. But this effort is to get botanists/people who generate data, get them engaged and working together to fill gaps, flesh out the data. Making sure everyone is aware of the potential so that people will cash in on the tools being developed, coordinating proposals, etc.

Question: How do we leverage iPlant and NSF to fund proposals that will fill in the gaps? Pam’s understanding is that NSF is going to be very supportive of projects that will use iPlant infrastructure. Karla confirmed this is correct.

The other iPTOL WGs are one sort of thing, but DAWG is where iPTOL meets the rest of the phylogeny/botanical community. For any iPlant project, it’s key to involve the community, and so this DAWG is where that can occur.

To summarize the DAWG’s goals – two fold purpose – first is data acquisition and second is community engagement and outreach.

One central issue is how to approach the existing data sets: go for a rectangular matrix with lots of open spaces (unsequenced genes) vs. take a subset of taxa with completed gene sequences, a very narrow look. What’s the optimal approach? Michael D says DAWG should explore the different strategies. Doug concurs; try multiple approaches, which would be a nice contribution of this project. Pam – wants the DAWG to think about how we get missing data? What is the structure of the data in Genbank, what does it look like, how can we work with it? What is a strategy for improving it and fleshing out the data gaps?

Organizing the Data Assembly Workshop: 40+ people on current attendee list—what is optimal number? Pam says 50-ish but she and Doug keep thinking of more people, and it doesn’t even have all the iPlant staff yet—so what is the budget/limit? If we can go as high as 60 would be good. iPlant has to come back to the DAWG with a number, want those who are invited to feel engaged and to connect with their colleagues. Sheldon suggests have an optimal number of data analysis people mixed in and not overwhelmed by the botanical folks. Have key people from other WGs present, e.g., Todd, Brian O’Meara, Sanderson, etc. Goal is not to go to far into data analysis, but their presence will benefit them as well and their WGs.

Workshop venue? Like Biosphere 2 as an option but it’s not very convenient and room rates have gone up. Other venues to consider are New England (Yale), Dallas-Austin, Tucson-Phoenix, NESCent. Sheldon and Karla will follow up on a solid attendee number limit/budget and location. Be sure to include iPlant staff (6-15?) in the final total number.

Applicable existing data sets and resources – iPlant needs guidance on research that we can do now to existing datasets. Genbank for sure, but what else? There’s unpublished data in TOL project, Steven Smith has those data and is doing an analysis of those. Other NSF-funded TOL projects, bryophytes, conifer, etc. have extensive datasets that aren’t in Genbank. Bar-coding group may have info—CBOL, headquartered in Guelf and Smithsonian, it eventually gets deposited in Genbank though.

Other issues? Workshops are fine but the follow-up on agreements/ideas is the hard part. Michael D wants to have a point person whose role/responsibility is to follow-up with those people. Not just developer, but point person on iPlant side would be ‘nag’ to follow through, compiling, etc. Follow up with web-based tools for compilation/data submittal, and someone to keep track of everything. Need to coordinate with Bill Piel and tree-based structure. Important point is consider things that might subvert Genbank – don’t want to do that necessarily, but need to think about how data compilation/storage will happen. TOLKEN—repository for data sequences, morphology, images—invite someone from TOLKEN to the workshop. Would be good to have developers from iPlant—b/c we might ID a new niche, but need to know whether software development would be productive/useful. If Steven Smith and Casey Dunn come to the workshop, they’ll know immediately if useful tools are available.

End: 11:01 AM MST

Action Items:

  • • iPTOL leads to review invitee list for Data Assembly Workshop and prioritize the list.
  • • Sheldon/Karla to provide budget and venue information to consider in planning invitee list.

Note from Tina: Workshop venue/date considerations: SC09 is November 14-20 (Sat-Fri) in Portland, OR; Thanksgiving is Thursday, November 27. iPlant Board of Directors Meeting is tentatively planned for November 20-22, near Portland (question). Generally speaking: consider the effects of weather on travel, so for best attendance in November, consider venues in southern latitudes with an airline hub (e.g., Dallas, Phoenix, Atlanta, Los Angeles).