PARAFIT

Prepared by: Liya Wang

Tool Name: parafit

Homepage: http://www.bio.umontreal.ca/casgrain/en/labo/parafit.html

Platforms: Mac OS X (powerpc), 32-bit DOS

Implementation Language: Fortran77

Overview:
ParaFit is a program that tests host-parasite evolution. It tests the hypothesis of coevolution between a clade of hosts and a clade of parasites. The null hypothesis of the global test is that the evolution of the two groups, as revealed by the two phylogenetic trees and the set of host-parasite association links, has been independent. The method requires some estimates of the phylogenetic trees or phylogenetic distances, and also a description of the host-parasite associations (H-P links) observed in nature. Two types of test are produced by the program: a global test of coevolution and a test on each H-P link.

Literature:
Legendre, P., Y. Desdevises and E. Bazin. 2002. A statistical test for host-parasite coevolution. Systematic Biology 51(2): 217-234

Related Tools: AxParafit, copycat

Input: There are three input files called A, B and C.

  • File A contains matrix A(n1 x n2) whose dimensions are n1 = number of parasites (rows of the matrix) and n2 = number of hosts (columns of the matrix). The file contains 1’s when a host-parasite link has been observed in nature between the host in the column and the parasite in the row, and 0’s otherwise.
  • File B contains a matrix of principal coordinates (n1 x n4) computed from either a matrix of phylogenetic distances or a matrix of patristic distances among the parasites. A matrix of patristic distances represents exactly the information in a phylogenetic tree. Note the dimension n4 of this file. Since there are at most (n – 1) principal coordinates for n objects, n4 is at most equal to (n1 – 1).
  • File C contains the transpose (n3 x n2) of a matrix of principal coordinates of size (n2 x n3) computed from either a matrix of phylogenetic distances or a matrix of patristic distances among the hosts. Note thedimension n3 of this file. Since there are at most (n – 1) principal coordinates for n objects, n3 is at mostequal to (n2 – 1).

Math:

  • D = CA'B
  • ParaFitGlobal = trace(D'D) = sum(d_i_j^2)
  • TraceMax = max(sum of squared eigenvalues of B, sum of squared eigenvalues of C)
  • ParaFitGlobal = trace(D'D)
  • ParaFitLink1 = trace-trace(k)
  • ParaFitLink2 = [trace-trace(k)] / [TraceMax-trace)]

Output: The output file (“Host-parasite.out”) contains the following information:

  • The dimension of each matrix (file)
  • Number of permutations
  • Global test of cospeciation: ParaFitGlobal and Prob (alpha = 1- the probability, that ParaFitGlobal is larger than or equal to most of those under permutation)
  • Individual test of cospeciation: ParaFitLink1 and Prob
  • Individual test of cospeciation: ParaFitLink2 and Prob

Discussion:

  • With phylogenetic trees: tree --> patristic distance (among species) --> principal coordinate analysis
  • W/O phylogenetic trees: phylogenetic distances (from raw data: morphology, sequence, etc) --> principal coordinate analysis
  • Principal Coordinates can be computed by Principal Coordinate module of the R package or DISTPCoA
  • ParaFitLink1 has greater power for correctly detecting coevolutionary links in saturated coevolutionary models that contain additional random links
  • ParaFitLink2 has greater power for correctly detecting coevolutionary links in unsaturated coevolutionary models, in which some links are random
  • ParaFitLink2 cannot be used in perfect coevolutionary situations (denominators is zero)
  • Type-II statistical error of Parafit decreases with the size of the dataset

Example of output:

ParaFit: A test of host-parasite coevolution
Pierre Legendre
Département de sciences biologiques
Université de Montréal.
© Pierre Legendre, 2001
Matrix A = A(17X15)
Matrix B = B(17x16)
Matrix C = C(14x15)
Number of permutations: 999
Global test of cospeciation: ParaFitGlobal = 0.01390 Prob = 0.00100
Test of individual host-parasite links:
F1 = ParaFitLink1 F2 = ParaFitLink2
Parasite 1 Host 2 F1 = 0.00105 Prob1 = 0.04000 F2 = 0.09387 Prob2 = 0.00500
Parasite 2 Host 1 F1 = 0.00093 Prob1 = 0.02000 F2 = 0.08323 Prob2 = 0.00200
Parasite 3 Host 3 F1 = 0.00172 Prob1 = 0.00100 F2 = 0.15353 Prob2 = 0.00100
Parasite 4 Host 6 F1 = 0.00193 Prob1 = 0.00800 F2 = 0.17212 Prob2 = 0.00100
Parasite 5 Host 5 F1 = 0.00163 Prob1 = 0.00100 F2 = 0.14602 Prob2 = 0.00100
Parasite 6 Host 7 F1 = 0.00158 Prob1 = 0.01700 F2 = 0.14094 Prob2 = 0.00200

Background:

1. History

  • 1913, Farenholz’s Rule – parasite phylogeny mirrors host phylogeny
  • 1940, Szidat’s Rule – primitive hosts harbour primitive parasites
  • 1964, ‘coevolution’ introduced in a study on butterflies and their plant hosts; defined as the extent to which the host and parasite phylogenetic trees are congruent, where congruence refers to the degree to which the host and parasites and their hosts occupy corresponding positions in the phylogenetic trees.
  • 1980s H-P coevolution rigorous analytical method

2. Tools Developed: What is the most probable co-evolutionary history of the host-parasite association given the costs of the different events: by adequately mixing the types of the events and trying to minimize the overall cost of the estimated evolutionary scenario.

  • Brooks parsimony analysis, BPA, 1991
  • Component analysis, Component, 1993
  • Based on reconciled phylogenetic trees, TreeMap, 1994
  • Event-based; TreeFitter, 1997
  • Jungles, 1998

3. Limitation: Ideally designed for the one host-one parasite case; highly computer-intensive for multiple hosts and parasites, making optimal solutions hard to find.

4. Parasite Paper:

  • Null Hypothesis: the evolution of the two groups has been independent as revealed by the two phylogenetic trees and the set of H-P association links.
  • Co-Evolution: 1. ‘true phylogenies’ are the same; 2. The hosts and parasites located in corresponding positions of their respective trees must be associated (linked).