Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data Assembly

General Data Assembly

Deliverables:

  • An industrial strength pipeline for data assembly for very large data sets;
  • Collaboration and Analysis Tools (contribute one's own data, download data, analyze data, keep data/results private) through the DE.

...

  • Robust data upload capability (IRODS; Rion Dooley, Nirav Merchant), late 11Q1?
  • Meta-data management (Rion Dooley, Nirav Merchant, Sudha Ram), early 11Q2;
  • Robust data storage and retrieval, collaboration tools in DE (iPlant-wide requirement; core software), 11Q2;
  • Advanced collaboration tools (iPlant-wide requirement; core software), 11Q3;
  • Input data validation (core software in collaboration with data integration ETAs);
  • Multiple sequence alignment generation:
    • PHLAWD (John Cazes, Stephen Smith);
    • Gordon Burleigh's pipeline (John Cazes, Eric Lyons, Stephen Smith);
    • Muscle, other alignment strategies (John Cazes, Eric Lyons).
  • Sequence database(s) bad GIs for PHLAWD, etc (Sheldon McKay and delegates), 11Q2.

My-Plant/My-Crop

Deliverables:

  • My-Plant: Robust, widely used scientific collaboration network for the plant sciences based on a phylogeny metaphor;
  • My-Crop: In support of integrated breeding platform, a scientific interaction site as well as data landing pad.

...

  • Refactor back-end for generic 'clade' structure to facilitate other display/organization paradigms (Matther Helmke), late 11Q1:
    • Node based;
    • Drupal core taxonomy.
  • Implement Drupal module to ingest and provide basic search functionality for relevant literature citations from the user community and public databases (Steve Mock and delegates), 11Q2;
  • Integration with Facebook and other similar sites. Initially with passive linkouts, possible later with Facebook page or app (Steve Mock and delegates), 11Q3;
  • Integration (as consumer) of TNRS and iPlant tree viewer (Steve Mock; Matt Hanlon);
  • My-Crop (different paradigm for display – also data repository):
    • scoping, early 11Q2;
    • early implementation, late 11Q2;
    • full implementation, early 11Q3.

Trait Evolution

Deliverables:

  • An infrastructure for trait analysis and ancestral characters estimation.

...

  • Tool integration in DE-R scripts and command line tools (Naim Matasci and delegates), early 11Q2;
  • Data uptake- files and external web based data from TreeBase (Sonya Lowry and delegates);
  • Tree viz integration:
    • New visualization needs (Kris Urie), early 11Q2;
    • Call backs for new analyses (Adam Kubach, Sonya Lowry), mid 11Q2.
  • DE integration:
    • Integration (Sonya Lowry and delegates), late 11Q2;
    • Metadata mapping (Naim Matasci), late 11Q2;
    • Analysis and viewer integration (Sonya Lowry and delegates), late 11Q2.
  • Code and documentation release (Naim Matasci, Matthew Helmke), 11Q2.

Tree Reconciliation and onekp

Deliverables:

  • Applications to perform, visualize and analyze the evolution of gene families from the onekp project with gene-species tree reconciliations.

...

  • Sequencing and transcript assemblies (onekp consortium; external);
  • Gene cluster identification and alignment (Norm Wickett), ongoing;
  • Continue to manage onekp data intake and tool implementation at TACC (Michael Gonzales, Sheldon McKay, Chris Jordan).

TNRS

Deliverables:

  • A Taxonomic Name Resolution Service that:
    • will query taxonomic data from Tropics and other data services using GNI architecture and global names index allowing for different nomenclatures;
    • recover validated names names using exact and fuzzy matching algorithms;
    • inspect taxonomic status of validated names and convert synonyms where applicable.

...

  • UI redesign (Nicole Hopkins), 11Q1;
  • Algorithm to handle synonyms (Jerry Lu), early 11Q2.

Big Trees

Deliverables:

  • Computational infrastructure to build ToL.

NINJA/WINDJAMMER.

Strategy:

  • Optimization of NINJA (neighbor joining implementation) for HPC.

...

  • Completed, minor tweaking of MPI.

RAxML

Tasks:

  • Implement RAxMl-lite on Ranger, benchmark with various data sets (John Cazes), 11Q1;
  • Implement web interface (relies on foundational API; Steve Mock and delegates), 11Q2.

Status:

  • In progress

Phylogenetics Workflow and Perpetually Updating Tree

Workflow

A Nascent workflow has been added to the the DNA subway as an education tool. This can serve as a model for integrating phylogenetic analysis tools to the DE.

...

  • The basic strategy is an automated workflow that will synch with GenBank or other data repository, build or iterate on on a chaacter matrix, re-run the tree building and update the Discovery Environment.

Tree Visualization

Deliverables:

  • An interactive tree viewer that:
    • Makes possible to view large trees as a stand alone tool;
    • Makes the green plant ToL and sub-trees available in the iPlant DE;
    • Meets the visualization needs of Trait Evolution and Tree Reconciliation and of other applications in the Discovery Environment.

...