Tuesday, January 5, 2010
That Darn LBA
Ah...I remember the days. Being a young grad student, trying to wrap my head around systematic biology and finding myself immersed, and sometimes confused, about the debates going on in the literature - and in seminar rooms - over parsimony vs. maximum likelihood. One favored topic of discussion was susceptibility to long-branch attraction (LBA). In my case, I went so far as to organize a graduate seminar that involved reading lots of papers and dragging John Huelsenbeck and Mark Siddall up to Burlington in the dead of winter to try to set us straight. I don't know about the rest of my cohort, but I finished the semester thinking that the only reasonable solution was to do my best to sample enough taxa to disrupt LBA as much as possible. And then, much of the controversy died down for a while. Part of this, I speculate, was due to the development and growth in popularity and theory of using Bayesian inference in phylogenetic analyses. BI was thought to have an advantage over ML in that it could incorporate uncertainty over the "nuisance parameters" in an analysis. A recent paper in PLoS One by Bryan Kolaczkowski and Joe Thornton, however, has raised the ugly head of LBA again. In this paper, Kolaczkowski and Thornton presented convincing data that BI is very susceptible to inconsistency and bias, particularly in cases of LBA (the "Felsenstein zone") - and that these problems are exacerbated when the amount of sequence data increased, with the posterior probability support values for incorrect clades converging to 1.0. Kolaczkowski and Thornton explored these effects with classical four-taxon trees, with real, known-to-be-problematic datasets (the troublesome Encephalitozoon), and other datasets with prescripted heterotachy and other heterogenous parameters in the evolutionary model. Importantly, they contend that "more sophisticated MCMC algorithms and more complex priors" cannot alleviate the bias that BI shows. The blossoming field of phylogenomics and the desire to incorporate larger and larger matrices into our systematic analyses, may thus lead us to produce well-supported but false trees if BI is used, if our datasets contain instances of LBA - and really, whose don't? This was a good read with some very important implications. I'm anxious to hear what others think of it.
Posted by Susan Perkins at 8:07 PM