iPToL Year 4 Roadmap

Revision 2.3 – Q3

The June-September period has been largely occupied by outreach efforts, with iPToL presence at the following conferences:

In certain area, the progress of the iPToL has been affected by changes in the discovery environment development plan. This affected in particular data management, visualization and integration into the discovery environment. As a result some original objectives have been deprecated or postponed to a later date to be determined according to the overall development. In some cases, the working groups have taken over additional projects.

Data Assembly

General Data Assembly

After some initial difficulties, the different components of the Data Assembly are now either on-track for delivery or already completed. The data upload and management capabilities have been greatly improved through a refactoring of the DE backend. The work on the ingestion pipelines is also moving forward. Collaborative tools are planned as part of the future DE development.
Deliverables:

Strategy:

Tasks:

My-Plant/My-Crop

The backend has been successfully redesigned, which will allow the further development of My-Crop
Deliverables:

Strategy:

Milestones:
1. Official launch 10Q3.
Status:

Tasks:

Trait Evolution

Initial work on the integration of tree stretching (evolutionary) models identified major problems with the optimization routines used by the underlying R package. Because the problems are severe and affect other packages and functions (optim) that are used across the spectrum of biological sciences, the group decided to further investigate the problem, with specific application to phylogenetic questions. A statistics graduate student has been integrated into the group to work on a project that should elucidate the conditions under which the performance of the optimization routines yield unreliable results and formulate suggestions on which routines are more appropriate.
Moreover, members of the group have integrated 4 additional tools into the discovery environment. A new tool written by members of the working group will be integrated and linked from the publication describing it.
Deliverables:

Strategy:

Milestones:
1. Identified set of 5 components for integration:

2. Released 1st component (PIC) 10Q2.

  1. DACE and CACE included in the 3rd release of DE 11Q1;
  2. PL, lopper, OUCH, Picante, DTT included in the 0.4 release of the DE 11Q2;
  3. Code improvements in 2 existing R programs (ape, geiger) pushed back to community, 11Q
  4. Publications:

Status:

Tasks:

Tree Reconciliation and onekp

The sequencing of the 1000 transcriptomes has been completed and the group is now moving into the data analysis. The data for the "deep green" publication is currently being processed and the analysis should be completed by Nov 2011. The group has however identify issues with the current SOAP assembley and is looking into alternatives, including running Trinity Assembler on TACC resources.
Deliverables:

Strategy:

Milestones:
1.Bioinformatic pipeline for gene-species tree reconciliation completed and database populated with the reconciled trees, 10Q4;
2.Developer preview released with the following features, 11Q1:

3. Publications:

4. Completed serial pipeline for Bayesian gene trees
Status:

TR Tasks:

Onekp Tasks:

TNRS

At this point, the only task remaining is to provide support for multiple authorities and multiple taxonomic codes.
Deliverables:

Strategy:

Milestones:

  1. First release of the tool 10Q4;
  2. Completion of Phase 1/Scoping of Phase 2 11Q1;
  3. Support for Family and infraspecific epithets 11Q1.
  4. Support of synonyms 11Q2.
  5. Improvements to GNI parser and TaxaMatch codes pushed back to community, 11Q1
  6. Publications:


Status:


Tasks:

Big Trees

Both tools have been ported to a HPC environment and are available at TACC so that component of the work of the Big Tree group can be considered completed. However, continuous improvements to the codebase are undergoing. The tools can now be used to generate large phylogenies as part of the phylogenetic workflow and the perpetually updating big tree. A 55K tree has been published and is available through the DE and on the Tree Viewer. Furthermore, improvements to RAxML have been presented at Evolution11.
Deliverables:

NINJA/WINDJAMMER.

Strategy:

Milestones:

  1. Software rewritten from Java to C with an MPI;
  2. On-board distance matrix calculation added (K2P and Jukes Cantor for DNA; Blossum 42 for protein);
  3. Six day run time reduced 32-fold to 4.5 hours for 220K species data set;
  4. Two/three day run time reduced 1,800-fold to 2 minutes for distance matrix calculation on 220K set.

Status:

RAxML

Tasks:

Phylogenetics Workflow and Perpetually Updating Tree

Workflow
A Nascent workflow has been added to the DNA subway as an education tool. This can serve as a model for integrating phylogenetic analysis tools to the DE.

Perpetually updated TOLThis is predicated on the completion of the infrastructure for data matrix assembly, RAxML-lite tree building, tree visualization etc. and is being scoped by the iPlant scientific project Management team (Eric Lyons, Sheldon McKay, Matt Vaughn, Nicole Hopkins). Advice will be sought from the iPToL faculty regarding further requirements.

Tree Visualization

The shift in development priorities for the Discovery Environment resulted in the postponement of visualization related projects. Therefore the group is now concentrating in further developing the tree viewer as a standalone application and plans to submit a publication describing it, probably early next year.
Deliverables:

Strategy:

Milestones:

Status:

Tasks: