Friday, May 29, 2009
Evolution 2009 Program Available
Programs Gone AWOL: Structurama
Saturday, May 23, 2009
Adaptive Radiation v. Dog
Wednesday, May 20, 2009
Coordinating Travel to Moscow, Idaho for Evolution 2009
Friday, May 15, 2009
More "Parasites Rule"
Saturday, May 9, 2009
Anolis Symposium 2009
Thursday, May 7, 2009
Final Exam Day!
When We Fail MrBayes, Part II
As a friend asked yesterday, why aren’t more folks using AWTY to assess convergence? As far as I am aware, this is the only general diagnostic tool out there that is geared towards assessing convergence of topologies, rather than convergence of molecular evolutionary parameters. As such it, is explicitly addressing the one set of parameters that are (usually) of greatest interest to systematists: the tree itself.
With this in mind, I conducted an informal survey of convergence diagnostics in the literature. I looked at all articles published in two recent issues of Molecular Phylogenetics and Evolution using Bayesian inference (mostly MrBayes, but a few using BEAST) and tabulated the convergence diagnostics used. MPE seems like it should be a reasonable gauge of methods currently used by practicing systematists, although it would not surprise me if papers in some journals (Systematic Biology?) use convergence assessment that is, on average, more rigorous. Anyway, in 25 studies:
3 studies reported only that they “examined stationarity of LnL values” or something to this effect. I hesitate to say that this is the worst possible test of convergence, because 5 studies reported no test of convergence whatsoever and another tested for convergence by ‘discarding burn-in’. I think most readers of this blog would agree that these are generally not adequate.
The most frequent class involved some variation of analyzing multiple runs (11 studies); this includes checking the standard deviation of split frequencies for independent runs (6 studies) and comparing posterior probabilities for independent runs (2 studies). I think this is a good general strategy, but the majority of these considered only 2 independent runs. This is not good. Let’s imagine that treespace for your dataset contains two (rather different) topologies of high and equal probability. At convergence, your MCMC sampler should visit both of these topologies in proportion to their posterior probability (say, ~47% of the time for each, as no other topologies are nearly as good).
A major problem arises if it takes many generations to move between these topologies. Even for a highly-optimized MCMC sampler, it does not surprise me at all that it might take many millions of generations to move between “distant” regions of treespace, particularly for large datasets. If this is true, two runs is far from adequate, because there is a 50% chance that two independent runs will find the same high-probability topology first, and – if not run for a sufficient number of generations – it will appear as though the runs have converged, based on both similarity of posterior probabilities, standard dev of split frequencies, etc. This is something of a worst-case scenario, because – as I’ve described it – certain clades would appear to have ~1.00 posterior probability, when in fact the true posterior probability might be closer to 0.5. Unfortunately, there appears to be no good way of determining a ‘sufficient’ number of generations a priori, so the only solution here seems to be ‘lots of runs.’
The next most frequent strategy involved checking convergence of molecular evolutionary parameters, either by estimating effective sizes of parameters (6 studies), or by checking the Gelman-Rubin proportional scale reduction factor (1 study). I am skeptical of these approaches, considered alone, because I’ve found that there is often little correspondence between convergence of molecular evolutionary parameters and convergence of topologies. Perhaps this reflects some particularly troublesome datasets that I have worked with, but it does not leave me feeling encouraged. Note that I am not claiming that monitoring these parameters is unimportant, but that it is fundamentally inadequate with respect to our interest in topologies.
Wednesday, May 6, 2009
Dechronization Interviews Rob Desalle
SP: So, let’s start with the big question: Are you a cladist?
RD: I don’t think I’ve ever been a member of the Willi Hennig Society. But, yes – in the sense that if someone said that I had to only use one method, I would choose cladistics.
SP: Why?
RD: Because I’m a minimalist. There are fewer assumptions in parsimony.
SP: But isn’t that one assumption a rather big one?
RD: Yes, but that big one is easier to track than lots of little ones.
SP: But, you do use other methods, too, right?
RD: Yes, but I don’t do that to test the results of one method against another. I do it to test assumptions. Cladists were among to the first to really examine analysis space, i.e. do sensitivity analysis, beginning with the “Navajo rugs” of Wheeler and then Gatesy’s explorations of alignment space. Then Hillis, Bull, Huelsenbeck, etc. began doing the same thing, but with likelihood. And, we’re seeing a new wave of this now with whole genome analysis. I don’t believe that one should be pluralistic – just because lots of trees agree doesn’t mean you’ve found “the truth.” But, modified pluralism is testing the effects of your assumptions. You could stack your assumptions to give you the answer that you want, but then you’d be no better than the evolutionary taxonomists of the 1950’s.
SP: But, let’s take the opposite view – what do you do if you get a lot of disagreement?
RD: It would tell me that I needed to collect more data. Unless of course I’ve already sequenced the whole genome, then I’m f$@&ed.
SP: OK, let’s talk for a minute about your role as an editor. You’ve been on the editorial board of a huge number of journals right?
RD: I don’t know if huge is the right adjective; but I have been on a few, – Evolution, Molecular Phylogenetics and Evolution, Molecular Biology and Evolution, PLoS One, Conservation Genetics, Ancient DNA, BioEssays, and Mitochondrial DNA. Being an editor is great – you get to see all these papers before anyone else does – and you get to see the field grow and change. But I don’t think it’s the job of an editor to mold the field, per se – just monitor it.
SP: So, from the standpoint of an editor, how has the field of systematic biology grown and changed over the years?
RD: We are more aware of analysis space – and if it hasn’t been looked at in a paper, that really irritates me. Students are more and more sophisticated with respect to their toolkits. In the beginning, I was using RFLP’s and running the very first version of PAUP with IBM cards. Now students are using any number of analytical approaches and a plethora of computer packages. In the old days, you had to just make a tree but now it’s so easy to collect data that you can really think about the questions. Students understand the chemistry less but the algorithms better.
SP: So, what would your advice to students be?
RD: Twenty years ago, I told all of them to learn more biochemistry. Ten years ago, I was saying that they needed to learn more statistics so that they could understand maximum likelihood and Bayesian analyses. But, today, my advice is to learn programming – Perl, R, etc. You need to be able to manipulate your data – move s%#t around on your computer. They need to understand probability, statistics, parsimony, etc, but I don’t think the philosophical ramifications are as important.
SP: You’ve gone from doing population genetics and systematics on Drosophila to being really involved with microbial genomics and phylogenetics. Do you really think a bacterial tree of life is possible?
RD: Absolutely. From some of the work we are doing in the labs here at the AMNH we are finding that reticulation does not necessarily destroy phylogeny as severely as we’ve long thought. Even with horizontal gene transfer, there is a real phylogeny. And even if HGT is widespread, you still need the scaffold. It’s the same problem as lineage sorting in metazoa. I call them the “Freddy” and “Jason” of systematics – but just remember that Freddy died in 1991 and Jason went to hell in 1993 – in other words, these problems don’t obliterate the whole signal.
SP: What are the big remaining phylogenetic questions out there?
RD: Microbial taxonomy, for sure. It’s one of the most important fields. The existing taxonomy is good, but it needs to be done about 3 orders of magnitude better. The other one I would pick would be the nematodes. Taxonomy is like, really cool and important. It’s a real science!
Rob DeSalle is a Curator of Entomology at the American Museum of Natural History. He is affiliated with the AMNH Division of Invertebrate Zoology and works at the Sackler Institute for Comparative Genomics, where he leads a group of researchers working on molecular systematics, molecular evolution, population and conservation genetics, and evolutionary genomics of a wide array of life forms ranging from viruses, bacteria, corals, and plants, to all kinds of insects, reptiles, and mammals. Rob is also Adjunct Professor at Columbia University (Department of Ecology, Evolution and Environmental Biology), Distinguished Professor in Residence at New York University (Department of Biology), Adjunct Professor at City University of New York (Subprogram in Ecology, Evolutionary Biology, and Behavior), Resource Faculty at the New York Consortium in Evolutionary Primatology, and Professor at the AMNH Richard Gilder Graduate School.