Page Comparison

...

Dec 14, 2010

Invitees/Attendees (in orange)

Barb Banbury (bbanbury@utk.edu)
Jeremy Beaulieu (jeremy.beaulieu@yale.edu)
Joe Felsenstein (joe@gs.washington.edu)
Eric Lyons (elyons@iplantcollaborative)
Naim Matasci (nmatasci@iplantcollaborative.org)
Sheldon McKay (sheldon.mckay@gmail.com)
Brian O'Meara (omeara.brian@gmail.com)
Ann Stapleton (stapletona@uncw.edu)
Luke Harmon (lukeh@uidaho.edu)

Topics for discussion (updates in orange)

User personas, user stories, use cases and acceptance tests

The core software team requires user stories for the analyses being currently implemented (CACE and DACE). The WG has already collected such information for the PIC implementation an it would be useful to review what is already there as a starting point. The group discussed the various reasons to have user personas and user stories. From the development perspective these stories will greatly facilitate the writing of the necessary documentation for the tools as well as provide the base for the user acceptance tests. Given the progress with the integration of DACE and CACE the development of such stories has become more urgent. The stories should be somewhat application specific but also include more general aspects (data gathering, visualization,...). Ann points out that she has starting to work with the Dolan center towards integrating trait evolution methods into the DNA subway. This offers a formidable opportunity to provide a tool that can be used to teach evolutionary concepts. The group will also take the educational needs into account in their user stories. The stories will be developed interactively and iteratively by the group through a wiki page. wiki page. A linked wiki page will be used to define user acceptance tests and test datasets. These are basic tests required to make sure that the applications run properly within the iPlant environment. The use of the dataset (testing, example, illustrating,...) will need to be indicated. The tests for CACE and DACE have been written and submitted to the QA team that will start testing as soon as the tools come online.

ape integration and timeline

The R statistical software package ape (Analyses of Phylogenetics and Evolution) offers a comprehensive suite of phylogenetic methods. A key point in favor of ape is the fact that ape can perform an estimation of uncertainty of ancestral state reconstructions. Naim has started testing the function and the possibility to efficiently run R code on the Condor cluster. Another candidate package for integration is geiger, which includes several methods for model fitting. Naim reports that he could successfully run ace on the Condor cluster and that given that the GUI and filetypes for ace are the same for contrasts he thinks he can have a working demo within 2 weeks. In addition, because ace can also be used to estimate discrete character states, he thinks he could be able to provide that functionality too. One aspect that has not been dealt with yet, is what form the output should have, and in particular the need to merge the tree with the internal node value estimates. Naim will look into what possible solutions are available. Brian and Barb also point out that people will use ace to reconstruct the ancestral character state, whereas if thy want to obtain the model parameters, they might prefer to use geiger. Another important part of the output are the errors and warnings produced by the scripts. Naim reports that he encountered some issues with running R that are now solved. He has submitted the code to core software development who estimates that the implementation effort will take about a week, starting on Wed/Thurs. Unfortunately, core software encountered some problems with the link to the execution framework which resulted in delays. They indicate that the problem might be resolved later this week. In the meantime, Naim changed the CACE R script so that it can handle multiple traits. Brian asked if it can handle missing data, and Naim answered that it can. However, he has not tested it. He will immediately test it end ensure that there are no problems. Naim was informed that the problems has been resolved and that the development is now focused on the integration. He was told that he should be able to run R scripts by the Dec 3rd. The core software team can now run R script and is actively integrating DACE into the DE. Integration is still ongoing. Naim will inform the WG as soon as they become available.

Concurrent implementation of other tools

...

As the number of integrated tools increases it will be crucial to provide the users with the details of and bibliographic references to the methods. This information is stored in the tool description and should be outputted as a text file with every job.
The output of the function dtt includes a graphical plot. As far as member of the group can tell, there is no plan to integrate a plotter into the discovery environment. Because a plotter could be a cross cutting need, Ann will follow up with the other working group and in particular with TV to assess their needs and plans. As the integration of a plotter could take some months, the short term solution is to use R's plotting capabilities and output pdf files. Ann points out that the main missing piece is performing the statistical analyses necessary to turn the raw output data into visual information. Naim mentions the fact that he has started discussing with core software about a framework to directly connect visual tools through adaptors. He thinks that the framework could be generalized to tool output as well. Ann also mentions that DNA Subway has a framework for tool integration and that it would be valuable to follow their development.

Jeremy worked towards improving the performance of the algorithm to compute the variance-covariance matrix used in geiger. Using matrix algebra, his implementation reduces the runtime for a 10,000 tip tree from >12 hours to approximately 4 minutes! This implementation require a lot of memory (~10G) and a 64 bit architecture. Naim will check with Nirav whether he has any suggestion.

Methods' input validation

...

Find out release plans for big tree. He will also ask for whatever is there to be made available for testing and preparing examples --Naim
Identify test datasets for teaching and presentation – WG
- Investigate GBIF dataset -- Naim and Jeremy
Identify test users – WG
Investigate datasets available through MyPlant/Data integration – Naim
Think about possible studies that would highlight the potential of iPlant -- WG
Add real world dataset examples for testing to the wiki – WG
Inquire regarding the plane for code release --Naim** These is the answer I received:
I plan to talk about this during our iPlant presentation. We will be doing iterative releases and I'll be looking for input from the community on what they would most like to see released first. We don't want to release everything at once only to not be positioned to support it all. That will leave the community feeling that we are negligent. Instead, we'll do one or a few libraries at a time and building up the support needs as we go. Please do let Brian know that we are interested in which aspects of the system he would like to see released first. I would like to get at least one library (or component) out before the end of this month. That, of course, depends upon the licensing due diligence being completed by then.
Forward API info – Naim: https://pods.iplantcollaborative.org/wiki/display/docs/Foundational+API

Other

Brian asks whether there is any feedback form the NSF site visit. Ann reports that one of the major issue is to improve communication within iPlant.
The next meeting will focus on the next goals for TE.

Completed items

Performance

...

To investigate an optional "make “make trees available for internal testing" testing” functionality when trees are uploaded for analysis. Nicole suggested that this function can be made available within the planned collaborative framework by creating a "iPlant Testing" user group.

...

Versions Compared

Old Version 3

New Version Current

Key