The purpose of this document is to demonstrate how doing bootstrap replicate analysis with PHYLIP's SEQBOOT, DNAML and CONSENSE could be decomposed for use on Condor.
- Perl installed
- PHYLIP installed
The program used for the basic phylogenetic inference operation is DNAML. It is an implementation of the maximum likelihood method described here
PHYLIP programs all have interactive menus but, for convenience, I use a little perl wrapper that accepts as arguments the program name (DNAML), a command file to be passed via STDIN and output file name(s)
- The command file is simple, it just says the name of the alignment file and 'Y' to proceed
- The DNAML run would be invoked as follows (using time)
- So a single run takes about six seconds.
The thing that makes it a bit more complicated is the requirement to do bootstrap replicate analysis.
- Consider doing doing 2000 replicates, that would be about 12,000 seconds or > 3 hours. This is the part that could be paralellized
- The program used to create the 2000 replicate data sets is SEQBOOT. The command file for this is:
This encodes a command to do 2000 bootstrap replicates using 777 as a random number string.
It is run with the incantation:
dividing up the work (This is the bit where Condor would come in).
The replicate data file uses this header, we can use it as a delimiter for splitting
Now we have to split up the one big file with 2000 data sets to 2000 data files. The script below will do this:
- We now have a "split" sub-directory with 2000 replicate files in it. This would be deployed in parallel and the tree files would need to be re-consolidated after.
We will now have run 2000 instances of DNAML on the 2000 files. There will be 2000 trees (each named 'outtree' by DNAML).
- The trees can be consolidated into one file that would look like this (this example has 10 trees):
Now, we can take the majority rule consensus tree (again using the example above) using the program CONSENSE.
The command file:
where treefile.txt is the consolidated trees.
and the consensus file looks like: