Monday, April 6, 2009

New Insight on Size Free Morphometrics

An interesting discussion on how to remove size from phylogenetic comparative analyses of morphological data is playing out on the R-sig-phylo list-serv (a forum about the use and development of phylogenetic and comparative methods within the R platform). This topic has been contentious for a some time and the ongoing discussion should be of interest to anyone looking to analyze morphological data in a phylogenetic context. Several heavy hitters (Ted Garland & Joe Felsenstein) and Dechronization bloggers (Dan Rabosky & Liam Revell) have already weighed in with insightful remarks.


Liam Revell said...

Yes, this was an interesting discussion.

Regarding size-correction, several important points came up:

1) DR proposed that if size-correction is performed using least-squares regression, the slope (i.e., allometric coefficient) is estimated unbiasedly.
This is true, as shown by Rohlf (2006).

2) However, also as shown by Rohlf and pointed out by TG, these estimates are not the "minimum variance estimates" (meaning that although they are not biased on average, any individual allometric coefficient can be quite poorly estimated).

3) It's relatively straightforward to perform size-correction using the phylogeny.

4) The consequence of using a bad allometric coefficient (i.e., one estimated using linear regression ignoring the phylogeny) for size-correction can be substantial, even if subsequent analyses are performed using phylogenetic methods. I have shown in a recently submitted manuscript that under simplistic simulation conditions (pure-birth trees, n(taxa)=100) the type I error rate of a contrasts regression performed on data that were subject to non-phylogenetic size-correction can be doubled over its nominal level. This situation would be worsened under more realistic conditions (e.g., for birth-death phylogenies).

5) In my manuscript I provide mathematical details and computer (R, Matlab) code for all the calculations required to perform phylogenetic size-correction or PCA and obtain residuals or scores for species (which are then the input for subsequent phylogenetic statistical analyses, such as regression on independent contrasts). In the event that it gets favorable reviews and is eventually published, please look for it!

Thanks to Luke H. for bringing this discussion thread to my attention, and to Dan R. for initiating it.

Glor said...

Nice summary Liam.

Dan Rabosky said...

thank Liam. One thing I wonder about is whether model misspecificiation can swamp any error due to failing to correct for phylogeny during size-correction (or for neglecting intraspecific error for that matter). Granted, we can test for this, but it seems pretty easy to reject the BM model in general.

So... I'm just thinking generally about the fact that our ability to statistically treat all sources of error in our analyses is improving dramatically, but fundamentally is conditional on some particular model. How robust is all of this to model misspecification? It seems to me a hard problem, because absolute model fit is not so easy to evaluate. We might know that model B (say, OU with 1 opt) fits the data better than models A, C, D, and E based on AIC, but this does not necessarily mean the model fits the data well in any meaningful sense.

Liam Revell said...

Hi again, Dan.

Since we're talking about correcting for interspecific allometries, it's probably more relevant to consider stochastic linear OU rather than discrete optima Ornstein-Uhlenbeck (à la Butler and King 2004).

In this model, species revert to a consistent line of allometry, rather than to a consistent adaptive peak or set of peaks. It thus avoids the problem of "inherited maladaptation" that can exist for Brownian Motion (BM).

For very strong adaptation to the line of allometry, the error structure of the residuals of the SL-OU model will converge to a diagonal matrix (i.e., become non-phylogenetic). And, much like with traditional OU, when adaptation relative to so-called "phylogenetic inertia" is weak, then the error structure converges to BM.

Since what we are usually concerned with when correcting for size is minimizing the variance of our interspecific allometry slope and intercept (and we may not care too much the evolutionary process per se) we might instead estimate Pagel's λ and then use the error structure described by the tree and λ for our regression model (e.g., Freckleton et al. 2002; Revell and Harrison 2008). This should work well, I think, so long as the evolutionary process has produced covariance between species that is proportional to the their time of shared history (and the proportionality constant can even be zero, meaning no phylogenetic effect).