TR_04JAN10

Action items

Agenda

Notes

In attendance: Former user (Deleted), Sheldon McKay, Former user (Deleted), ane (Unlicensed)

Discussion start with the topic of Gene Name Mapping issue introduced Dec. 21, 2009

This is a pervasive problem, naming issues & taxonomic intelligence. For the prototype, an ad hoc intermediate solution will be needed. Early iterations: a captive set of gene families will mean that a solution for gene name mapping can be delayed. We could do a BLAST to overcome the mapping issue altogether.

This issue a broader challenge will solve in a more crosscutting way. It is on the radar of iPToL and part of a potential collaboration with BIEN (supporting work w/ Val Tannen, Bill Piel, Jerry Lu). First iteration, a small search space (quick, simple) could be a fix for naming needed. Don't want to invest huge amount of time on this. A lookup table could be sufficient. Bill Piel may have thoughts on other solution (or suggestions on lookup table approach).

Discussion of "Prototype Thoughts and Assumptions"

Sheldon led a walk-through of each bullet of the thoughts/assumptions.

  • May mix and match sources for best coverage. Prototype will us worked examples (may be quite a few), no user supplied data.

Tame datasets are needed for testing.

We're not concerned about the different resources. Need to identify test cases for development purposes. Ensembl lants is not ready for consideration (only 9 species included).

  • Full length CDS from ATG to stop codon?

With respect to data (1KP), Are complete coding sequences needed (from start to stop codon)?

Cecile did not think so. Only need to know the reading frame. BLAST used against a fully annotated gene. Need start codon, not to end of protein, just need proper alignment.

  • AA-guided nucleotide alignments for Phyml/treebest; may need to do (or redo) the alignments.

[do not have coverage in notes for this bullet-point]

  • Is there a divergence threshold that we consider too divergent for DNA-based ML? If so, is measured raw distance, Ks, Ka?

Sheldon asked if we might reach a point where you are too saturated with cDNA. Is there a threshold where you need to switch from DNA to protein sequences? Cecile said people look at distance and determine threshold. Sometimes DNA level can't align to protein (too distance). At which point you can align Amino Acid level and go back to DNA. (appropriate threshold not stated)

  • Species trees require internal node labels for treebest (may need to edit).

This is an interoperability issues. Infrastructure/functionality is needed to supply label where they do not exist. Jim mentioned in a previous meeting that the creation and use of 'temp' labels would be fine.

  • treebest gives us gene trees. Want to reconcile downstream of treebest, use 'reconcile' and primetv to show reconciled trees graphically.

TreeBeST gives us gene trees guided by the species trees. But does not create a reconciled tree. Cecile believe that it would give the number of gene duplication events.

Need placement of gene duplication & loss events.

Discussion tabled for further investigation.

Discussed ATV & 'fat tree' along with PRiMETV. Still need to work out visualization scenarios and see what works best. Ideally, the tool generating the graphical representation of the reconciled tree would be something that can be controlled from the command line. This would make it easy to wrap the tool and make the output (an image file) available in the prototype web application. A web application will be using this so we must steer clear of desktop applications.

  • Want to be able to view: alignments, gene/species trees, reconciled trees

Cecile said that we do not need to visualize alignments. The question of whether the prototype needed to perform alignments came up. The assumption has been that the prototype will start the workflow with alignments. Cecile confirmed that many will have their alignments ready. There are lots of tools for that, and molecular biologists can align them. Later, integration can handle this - and it would be an added bonus but is not a priority

  • treebest package compiled c binaries (good for intergration with web app)

No covered - just acknowledged the statement.

  • treefam database schema comes with perl API (good) and tied with treebest, consider using sub-set of treefam schema for prototype back-end, get the schema and API for free.

The treefam schema is seen as a bonus. The functionality of the prototype will need to be integrated by Core Software. Resolving the treefam schema and the internal data model (triples) used in the Discovery Environment was raised as a concern. It should not be an issue and the treefam schema is of interest as a 'best practice' for dealing with such data.

Moving forward...

Sheldon asked Adam to consider visualization side of the prototypte ('roughing out' a pipeline, web vis options, etc.).

Sheldon to get sample data and wrap TreeBeST.

Andy to think about UI in Google Web Toolkit (GWT) and gather requirements for analytical pipeline.

Proposed moving to bi-weekly meeting schedule and increase offline communication. This will help the prototyping effort.