discussion of MPI pre-processing of distance matrix
Notes
Minutes
NINJA/WINDJAMMER: development of the MPI implementatin of Neighbor-Joining in WINDJAMMER is complete and the program behavior mirrors NINJA both internally and externally, except for externalized memory.
Robert reports that initial estimates indicate a performance improvement of 20-35X compared to NINJA but more detailed and rigorous comparative benchmarking is needed. The test was run on c.a 2,500 nodes.
Adding more nodes would not further improve performance, as the limiting factor is the amount of available memory. Sheldon indicates that users might be interested in a non-parallel version of the program and that the source code should be made available via a publicly accessible repository.
Pre-processing of distance matrix. Generating the pair-wise distance matrix for 218K taxa takes 1 to 2 days and is a task that can be parallelized. It is a key requirement of this collaborationn that this function be added to WINDJAMMER.
Rob suggests a algorithm in which the sequences are divided in blocks of 1000 and each block is sent to a node to perform all pairwise comparisons among the 1000 sequences. In a second step, the blocks of sequences are exchanged among nodes so that every block is eventually compared to every other block. The only roadblock to an implementation is including a pairwise alignment function and file format issue.
Travis will work with Rob to solve these problem by helping to extract the relevant c code from quicktree and with File format issues
Action items
A1: Robert will provide benchmarking data comparing the performance of WINDJAMMER and NINJA for different sized trees.
A2: Robert will work on a prototye implementation of the parallel matrix computation alogrithm.
The next meeting will take place on Tuesday, Oct 6th at 9AM Eastern/10 AM central.