PHYLIP_CONTRAST_Example

Purpose

The purpose of this example to to demonstrate how to use the CONTRAST program in PHYLIP to perform Phylogenetic Independent Contrasts analysis using continuous character data and pre-made trees. We use a real data set of 2 continuous characters amongst 49 mammal species, as well as a synthetic data set comprised of 50K-species tree and a data set for two continuous characters. This document also introduces file formats used as inputs for CONTRAST.

Prerequisites

  • Access to a unix/linux shell
  • PHYLIP installed
  • Access to Mesquite
  • Nexus data files

Preparing the data files

Input file formats for CONTRAST are described in the documentation. The examples used here have a single representative for each species. Within-species variation is also supported by CONTRAST but is not shown in these examples

PLease note also that using the deskop GUI for Mesquite is not scalable or suitable for inclusion in a DE, just for demonstrating formats

  • The file PDAP.nex was provided by Brian Omeara. It is a NEXUS format file that contains character data. The file header indicates it was generated by Peter Midford.
    [written Mon Nov 24 19:18:14 CST 2008 by Mesquite  version 2.5 (build j77) at 88.99.124.24.cm.sunflower.com/24.124.99.88 (Peter Midford)]
  • Converting the NEXUS file to input suitable for phylip involved a few steps:
    1) Load PDAP.nex into Mequite. The character data are log values of body mass and home range (sample below).
    BEGIN CHARACTERS;
            TITLE  Body_mass_and_home_range;
            DIMENSIONS  NCHAR=2;
            FORMAT DATATYPE = CONTINUOUS GAP = - MISSING = ?;
    CHARSTATELABELS 
                    1 log_Body_Mass,
                    2 log_Home_Range ; 
            MATRIX
            Ursus_maritimus            2.423245874 2.062957834
            Ursus_arctos               2.400192489 1.918030337
            Ursus_americanus           1.970346876 1.754348336
            Nasua_narica               0.6434526765 0.0211892991
            Procyon_lotor              0.84509804 0.0569048513
            Mephitis_mephitis          0.3979400087 0.3979400087
            Meles_meles                1.064457989 -0.060480747
            Canis_lupus                1.547774705 2.307067951
    
    2) Export the character data File->Export->Tab delimited continuous data file->PDAP.txt
    3) Export the tree data as PHYIP (Newick) format: File->Export->Phylip (trees)->PDAP.tree.fel
    4) Modify the files for PHYLIP using the the ad-hoc perl script below. Two things that need to be done are to make sure the taxon labels are exactly 10 characters (required by PHYLIP) and that they correspond exactly between the character data file (PDAP.txt) and the tree file (PDAP.tree.fel)
    my %seen;
    while (<>) {
      my ($taxon) = /(\S+)/;
      # pad or truncate label to make it exactly 10 characters                                                                                                                         
      my $label = (length $taxon) < 10 ? sprintf('%-10s',$taxon) : substr $taxon, 0, 10;
    
      # check for duplications caused by label truncation                                                                                                                              
      if ($seen{$label}++) {
        $label =~ s/\S$/1/;
      }
      s/$taxon\s+/$label/;
    
      # also change the label in the tree file                                                                                                                                         
      `perl -i -pe 's/$taxon/$label/' PDAP.tree.fel`;
    
      print;
    }
    

Creating the PHYLIP infile (PDAP.fel):

$ perl fix_PDAP.pl >PDAP.fel 

Running CONTRAST

  • Create the command file. The series of commands below names the data and tree files and specifies the 'C' option to print out the contrast data.
    PDAP.fel
    PDAP.tree.fel
    C
    Y
    
    Use the perl script run_phylip.pl to execute CONTRAST and save the results as the file PDAP.contrasts.txt
    $ ./run_phylip.pl contrast command.txt PDAP.contrasts.txt
    Done. Outfile saved as PDAP.contrasts.txt. Program output saved as 'contrast.out'
    $ more contrast.out
    
    Continuous character comparative analysis, version 3.69
    
    Settings for this run:
      W        Within-population variation in data?  No, species values are means
      R     Print out correlations and regressions?  Yes
      C                        Print out contrasts?  Yes
      M                     Analyze multiple trees?  No
      0         Terminal type (IBM PC, ANSI, none)?  ANSI
      1          Print out the data at start of run  No
      2        Print indications of progress of run  Yes
    
      Y to accept these or type the letter for one to change
    
    Output written to file "outfile"
    
    Done.
    
    $ head -20 PDAP.contrasts.txt 
    
    Contrasts (columns are different characters)
    --------- -------- --- --------- -----------
    
       0.00001   0.00007
       0.00015   0.00008
      -0.00006  -0.00001
     -1053.85746 724.82686
       0.00001  -0.00008
       0.00100   0.00115
       0.00019   0.00029
       0.00002  -0.00010
       0.00012   0.00035
       0.00016   0.00041
       0.00007   0.00012
      -0.00004  -0.00022
      -0.00009   0.00025
       0.00001  -0.00027
      -0.00024  -0.00037
      -0.00006  -0.00001

Testing CONTRAST with a synthetic 50K data set

Original data files

  • The original data files, provided by Brian Omeara, are described here. They are synthetic tree and character data.
  • 50K_final_continuous.nex is the character data are in NEXUS format
  • 50K_final_newick.tre is the tree file in Newick (accepted by PHYLIP) format. The tree is ultrametric (a rooted additive tree where the terminal nodes are all equally distant from the root), binary (all nodes bifurcate) and has all positive branch lengths (some methods, such as NJ, allow negative branch lengths, which are not suitable for independent contrasts).

    Converting to PHYLIP format

    No modifications were made to the tree file. The character data were processed as follows:
  1. open 50K_final_continuous.nex in Mesquite
  2. export as simple text (File->Export->Tab delimited continuous data file->50K.continuous.txt)
  3. use ad hoc perl script to create PHYLIP file 50K.continuous.fel
    print  "   50000   2\n";
    while (<>) {
      next unless /^taxon/;
      chomp;
      my ($taxon,$s1,$s2) = split;
      my $label = sprintf('%-10s',$taxon);
      print join("\t",$label,$s1,$s2),"\n";
    }
    

    Running and benchmarking CONTRAST

  • It took CONTRAST ~20s to run the analysis on the 50K taxon data.
  • Results are save as 50K.contrasts.txt
    $ time ./run_phylip.pl contrast command.txt 50K.contrasts.txt
    Done. Outfile saved as 50K.contrasts.txt. Program output saved as 'contrast.out'
    
    real    0m19.535s
    user    0m18.975s
    sys     0m0.371s