Friday, October 22, 2010

Testing for Trait-Dependent Molecular Evolution

Itay Mayrose & Sally Otto have just published (Molecular Biology and Evolution Advance Access) a neat new method to test the hypothesis of a discrete extrinsic cause for shifts in the rate of molecular evolution on a phylogeny.

According to this method, the authors first obtain an ultrametric phylogenetic tree for the species in their study. They then generate a set of stochastic character histories (Nielsen, 2002; Huelsenbeck et al., 2003) for the discrete character of interest. Example discrete characters might be a "life history trait, morphological feature, or habitat association" - in their empirical test they examine halophilic and freshwater Daphnia species.

Now armed with a distribution of possible character histories on their estimated phylogeny, the authors simultaneously maximize the likelihood of their sequence evolution model and a scaling factor r, a parameter that increases or suppresses the rate of molecular evolution along stochastically mapped branches in the tree. Then they average across character maps.

In an extremely clear analysis of their method, the authors show it capable of producing remarkably good estimates of r for trees with even a modest number of tips (e.g., 20-60) when the true underlying phylogeny is known without error (Figure panel A). Under these idealized circumstances, estimation of r is only slightly biased for small numbers of species - as is common for maximum likelihood methods.

The situation is slightly more complicated when an estimated phylogeny (rather than the true underlying tree and branch lengths) is used. Here, they show that estimation of r can be quite severely downwardly biased, particularly for large values of r (Figure panel B). They think that this is actually due to error in the ultrametricization of their phylogenies - since in their study they used the same data for phylogenetic inference as they do for the estimation of r. This problem is not at all ameliorated for ultrametric phylogenies obtained by Bayesian relaxed clock methods. In the end, this issue argues strongly for the simultaneous estimation of the phylogeny, the character history, and the concomitant variation in nucleotide substitution rates - something that the authors also recommend.


Glor said...

Cool stuff.

mwpennell said...

I think this is a really cool paper. The method presented in this paper is really intriguing as it provides insight into the biological causes of rate heterogeneity.
One point that is worth considering is the authors' suggestion that "directly accounting for trait-specific rates of evolution during the phylogenetic inference step may have a profound impact on estimating phylogenies and divergence times." Though they make a strong case for this in this study, it is unclear (at least to me) how this would be applied to phylogenetic reconstruction. In order to estimate the relative rate parameter, one would have to specify the trait (or in future implementations, traits) hypothesized to cause rate heterogeneity a priori. Different traits may have different effects (or perhaps no effect at all) and I think that resulting phylogeny would be dependent on this choice. Does anyone have any thoughts on this matter?

Dan Rabosky said...

This is a really cool idea. I wonder whether a model that allows phylogenetic autocorrelation of rates across the tree will soak up much of this variation, if we don't know (or don't care about) the actual traits. Such models are already implemented in BEAST.

I just wonder whether we really need to know the traits to account for this variation... Also, if a character does affect rates of molecular evolution and thus our ability to reconstruct phylogeny/divergence times etc, I would think that we could test for this by comparing the relative abilities of phylogenetic and non-phylogenetic models of rate variation across the tree. If the non-phylogenetic model fits as well or better, maybe it is unlikely that trait dependent (or environment-dependent) molecular evolution is influencing our results (assuming that traits/environments have phylogenetic signal...).

Dan Rabosky said...

I correct myself...I *thought* the phylogenetic model was implemented in BEAST, but looks like the latest version has uncorrelated models only...