Data Assembly
General Data Assembly
Deliverables:
- An industrial strength pipeline for data assembly for very large data sets;
- Collaboration and Analysis Tools (contribute one's own data, download data, analyze data, keep data/results private) through the DE.
...
- Robust data upload capability (IRODS; Rion Dooley, Nirav Merchant), late 11Q1?
- Meta-data management (Rion Dooley, Nirav Merchant, Sudha Ram), early 11Q2;
- Robust data storage and retrieval, collaboration tools in DE (iPlant-wide requirement; core software), 11Q2;
- Advanced collaboration tools (iPlant-wide requirement; core software), 11Q3;
- Input data validation (core software in collaboration with data integration ETAs);
- Multiple sequence alignment generation:
- PHLAWD (John Cazes, Stephen Smith);
- Gordon Burleigh's pipeline (John Cazes, Eric Lyons, Stephen Smith);
- Muscle, other alignment strategies (John Cazes, Eric Lyons).
- Sequence database(s) bad GIs for PHLAWD, etc (Sheldon McKay and delegates), 11Q2.
My-Plant/My-Crop
Deliverables:
- My-Plant: Robust, widely used scientific collaboration network for the plant sciences based on a phylogeny metaphor;
- My-Crop: In support of integrated breeding platform, a scientific interaction site as well as data landing pad.
...
- Refactor back-end for generic 'clade' structure to facilitate other display/organization paradigms (Matther Helmke), late 11Q1:
- Node based;
- Drupal core taxonomy.
- Implement Drupal module to ingest and provide basic search functionality for relevant literature citations from the user community and public databases (Steve Mock and delegates), 11Q2;
- Integration with Facebook and other similar sites. Initially with passive linkouts, possible later with Facebook page or app (Steve Mock and delegates), 11Q3;
- Integration (as consumer) of TNRS and iPlant tree viewer (Steve Mock; Matt Hanlon);
- My-Crop (different paradigm for display – also data repository):
- scoping, early 11Q2;
- early implementation, late 11Q2;
- full implementation, early 11Q3.
Trait Evolution
Deliverables:
- An infrastructure for trait analysis and ancestral characters estimation.
...
- Tool integration in DE-R scripts and command line tools (Naim Matasci and delegates), early 11Q2;
- Data uptake- files and external web based data from TreeBase (Sonya Lowry and delegates);
- Tree viz integration:
- New visualization needs (Kris Urie), early 11Q2;
- Call backs for new analyses (Adam Kubach, Sonya Lowry), mid 11Q2.
- DE integration:
- Integration (Sonya Lowry and delegates), late 11Q2;
- Metadata mapping (Naim Matasci), late 11Q2;
- Analysis and viewer integration (Sonya Lowry and delegates), late 11Q2.
- Code and documentation release (Naim Matasci, Matthew Helmke), 11Q2.
Tree Reconciliation and onekp
Deliverables:
- Applications to perform, visualize and analyze the evolution of gene families from the onekp project with gene-species tree reconciliations.
...
- Sequencing and transcript assemblies (onekp consortium; external);
- Gene cluster identification and alignment (Norm Wickett), ongoing;
- Continue to manage onekp data intake and tool implementation at TACC (Michael Gonzales, Sheldon McKay, Chris Jordan).
TNRS
Deliverables:
- A Taxonomic Name Resolution Service that:
- will query taxonomic data from Tropics and other data services using GNI architecture and global names index allowing for different nomenclatures;
- recover validated names names using exact and fuzzy matching algorithms;
- inspect taxonomic status of validated names and convert synonyms where applicable.
...
- UI redesign (Nicole Hopkins), 11Q1;
- Algorithm to handle synonyms (Jerry Lu), early 11Q2.
Big Trees
Deliverables:
- Computational infrastructure to build ToL.
NINJA/WINDJAMMER.
Strategy:
- Optimization of NINJA (neighbor joining implementation) for HPC.
...
- Completed, minor tweaking of MPI.
RAxML
Tasks:
- Implement RAxMl-lite on Ranger, benchmark with various data sets (John Cazes), 11Q1;
- Implement web interface (relies on foundational API; Steve Mock and delegates), 11Q2.
Status:
- In progress
Phylogenetics Workflow and Perpetually Updating Tree
Workflow
A Nascent workflow has been added to the the DNA subway as an education tool. This can serve as a model for integrating phylogenetic analysis tools to the DE.
...
- The basic strategy is an automated workflow that will synch with GenBank or other data repository, build or iterate on on a chaacter matrix, re-run the tree building and update the Discovery Environment.
Tree Visualization
Deliverables:
- An interactive tree viewer that:
- Makes possible to view large trees as a stand alone tool;
- Makes the green plant ToL and sub-trees available in the iPlant DE;
- Meets the visualization needs of Trait Evolution and Tree Reconciliation and of other applications in the Discovery Environment.
...