DA_25FEB10

Agenda

  • Sharon to provide summary of pipeline requirements including common components and shared elements from discussions with Gordon and Stephen
  • Discussion/update on Faceplant project
  • Jerry to provide update on BIEN collaboration including current status of resolution service and upcoming meeting in St. Louis
  • open discussion

Action Items

Meeting Notes

  • Participants: Pam, Dough, Val, Sharon, Jerry
  • Sharon updated her progress on learning data assembly pipelines from Gordon and Stephen, shared components from both pipelines are
    • an iplant sequence database to hold sequence data from genbank (potentially other sources), syncronized with Genbank, and with user interface or API to mark, query data
    • defining homologous sets with different approaches, both involves using blast, filtering the blast results with various criteria, deal with (record) reverse compliments
    • MultipleSequenceAlignments, with different QC approach, one tree pruning with manul inspection with knowledge of phylogenetic backbone, the other with profile alignment with MDA scores
    • MSA concatenation to generate the superMatrix and feed to RAxML
  • Sharon also updated the group with facePlant discussion
  • Jerry updated the progress of collaboration with BIEN (deliverable, upcoming meeting)
    • Pam suggested the upcoming meeting in St Louis might/should include Nico from TOLKIN
  • the meeting attendees strongly suggested talking with Alex to find out in what form the superMatrix should be in, whether in a set of Multiple Fasta files, or in some database
  • Val brought up the issue of storing Multiple Sequence Alignments in databases and the challenge of running MultipleSequenceAlignments with huge dataset such as 500,000 sequences
  • Pam suggested bring the facePlant people together with APWEB people, the popularity of APWEB could help bring users to the facePlant
  • Pam also mentioned, in addition to genbank sequences, the data assembly infrastructure should be able to  allow user to upload their own sequence data, for the phylogenetic analysis in combination with genbank data