Sharon updated her progress on learning data assembly pipelines from Gordon and Stephen, shared components from both pipelines are
an iplant sequence database to hold sequence data from genbank (potentially other sources), syncronized with Genbank, and with user interface or API to mark, query data
defining homologous sets with different approaches, both involves using blast, filtering the blast results with various criteria, deal with (record) reverse compliments
MultipleSequenceAlignments, with different QC approach, one tree pruning with manul inspection with knowledge of phylogenetic backbone, the other with profile alignment with MDA scores
MSA concatenation to generate the superMatrix and feed to RAxML
Sharon also updated the group with facePlant discussion
Jerry updated the progress of collaboration with BIEN (deliverable, upcoming meeting)
Pam suggested the upcoming meeting in St Louis might/should include Nico from TOLKIN
the meeting attendees strongly suggested talking with Alex to find out in what form the superMatrix should be in, whether in a set of Multiple Fasta files, or in some database
Val brought up the issue of storing Multiple Sequence Alignments in databases and the challenge of running MultipleSequenceAlignments with huge dataset such as 500,000 sequences
Pam suggested bring the facePlant people together with APWEB people, the popularity of APWEB could help bring users to the facePlant
Pam also mentioned, in addition to genbank sequences, the data assembly infrastructure should be able to allow user to upload their own sequence data, for the phylogenetic analysis in combination with genbank data