dechronization: May 2009

Friday, May 29, 2009

Evolution 2009 Program Available

The program for this years meeting is now available, complete with a heat map illustrating how keywords are distributed among concurrent sessions (see image). Because I have an 8AM slot I'd like to use this opportunity to pre-emptively accept the apologies of everyone who's going to tell me they missed my talk because they slept in.

Programs Gone AWOL: Structurama

Anybody know what became of Structurama (Huelsenbeck's Bayesian approach to the same type of population structure questions addressed by Pritchard's Structure)? The website for the program - www.structurama.org - is currently off-line. I heard rumors of a revamped, user-friendly implementation and am assuming that page is off-line until this revised version is ready for public use.

Saturday, May 23, 2009

Adaptive Radiation v. Dog

I regret to report that one of the world's most spectacular adaptive radiations is no match for man's best friend. I had a lovely pink columbine in full bloom until my dog Eddie decided he was ready to begin his own career as a gardener. As beautiful and effective as those nectar spurs may be, they're no match for jaws. Perhaps we can get Justen Whittall to model the type of spur that might be required for specialization to a canine pollinator?

Wednesday, May 20, 2009

Coordinating Travel to Moscow, Idaho for Evolution 2009

London, Mexico City, Santo Domingo (Dominican Republic), Dublin, San Juan (Puerto Rico), San Jose (Costa Rica). What do these cities have in common? They're just a few of the exciting places I could visit more cheaply and more conveniently than the location of this year's Evolution meetings: Moscow, Idaho. In addition to the 10 hour flight from Rochester to Spokane, I've learned that the shuttles between Spokane and Moscow offered by the conference organizers are utterly useless for my itinerary (arrival at midnight on June 12th and departure at 8:30PM on June 16th). If anybody else is still working on transportation between Spokane and Moscow perhaps we can use the comments section here to help coordinate shared transportation? I've been given the impression that the only options outside of the conference shuttles are taxi (>$80 on way fare) or car rental.

Friday, May 15, 2009

More "Parasites Rule"

It's Friday afternoon, so I thought a little parasite distraction might be in order. Over at Mental Floss, there's a nice list of eight different parasites that affect their hosts' behavior, in order to promote their transmission: zombie snails, crabs, caterpillars, grasshoppers, fish, ants, and cockroaches. Good creepy stuff for cocktail party or coffee shop evolutionary chit-chat for sure! The last sentence is a question about whether or not there are parasites that affect human behavior. Clearly these folks need to read up on Toxoplasma!

Saturday, May 9, 2009

Anolis Symposium 2009

This October 2nd-4th, the Museum of Comparative Zoology at Harvard University will host the 2009 Anolis symposium. It's been over ten years since anole biologists gathered at Penn State, so there should be plenty to discuss. Registration is only $20 and slots for presenters are still available. The recently completed genome of Anolis carolinensis which will be the subject of a satellite meeting taking place at the Broad Institute of MIT and Harvard on October 5th. Worried you don't know enough about anoles to enjoy a conference like this? Just pick up a copy of Losos's new book and you'll be an expert before you know it (it's 20% off if you order prior to it's July 9th release)! Photo credit: Anolis fowleri by Luke Mahler.

Thursday, May 7, 2009

Final Exam Day!

Earlier this morning I gave the final exam for my course "The Tree of Life." Click the image to the right if you'd like to try your luck with the first page of my exam!

When We Fail MrBayes, Part II

For this (unauthorized) installment of When we fail MrBayes , I’d like to step back and look at how we assess convergence in the first place. I’ve encountered a few datasets now for which convergence problems would be very difficult to diagnose without tools like AWTY . I’ve found two diagnostics in AWTY to be especially useful. First, the slide command shows the posterior probabilities of clades for non-overlapping samples of trees in the sample: basically, a sliding-window of posterior probabilities. If a coarse scale of analysis shows posterior probabilities that vary widely during the course of a single run, this is strong evidence that runs have not converged. I am also a big fan of the compare command, which plots pairwise split frequencies for a series of independent MCMC runs.

As a friend asked yesterday, why aren’t more folks using AWTY to assess convergence? As far as I am aware, this is the only general diagnostic tool out there that is geared towards assessing convergence of topologies, rather than convergence of molecular evolutionary parameters. As such it, is explicitly addressing the one set of parameters that are (usually) of greatest interest to systematists: the tree itself.

With this in mind, I conducted an informal survey of convergence diagnostics in the literature. I looked at all articles published in two recent issues of Molecular Phylogenetics and Evolution using Bayesian inference (mostly MrBayes, but a few using BEAST) and tabulated the convergence diagnostics used. MPE seems like it should be a reasonable gauge of methods currently used by practicing systematists, although it would not surprise me if papers in some journals (Systematic Biology?) use convergence assessment that is, on average, more rigorous. Anyway, in 25 studies:

3 studies reported only that they “examined stationarity of LnL values” or something to this effect. I hesitate to say that this is the worst possible test of convergence, because 5 studies reported no test of convergence whatsoever and another tested for convergence by ‘discarding burn-in’. I think most readers of this blog would agree that these are generally not adequate.

The most frequent class involved some variation of analyzing multiple runs (11 studies); this includes checking the standard deviation of split frequencies for independent runs (6 studies) and comparing posterior probabilities for independent runs (2 studies). I think this is a good general strategy, but the majority of these considered only 2 independent runs. This is not good. Let’s imagine that treespace for your dataset contains two (rather different) topologies of high and equal probability. At convergence, your MCMC sampler should visit both of these topologies in proportion to their posterior probability (say, ~47% of the time for each, as no other topologies are nearly as good).

A major problem arises if it takes many generations to move between these topologies. Even for a highly-optimized MCMC sampler, it does not surprise me at all that it might take many millions of generations to move between “distant” regions of treespace, particularly for large datasets. If this is true, two runs is far from adequate, because there is a 50% chance that two independent runs will find the same high-probability topology first, and – if not run for a sufficient number of generations – it will appear as though the runs have converged, based on both similarity of posterior probabilities, standard dev of split frequencies, etc. This is something of a worst-case scenario, because – as I’ve described it – certain clades would appear to have ~1.00 posterior probability, when in fact the true posterior probability might be closer to 0.5. Unfortunately, there appears to be no good way of determining a ‘sufficient’ number of generations a priori, so the only solution here seems to be ‘lots of runs.’

The next most frequent strategy involved checking convergence of molecular evolutionary parameters, either by estimating effective sizes of parameters (6 studies), or by checking the Gelman-Rubin proportional scale reduction factor (1 study). I am skeptical of these approaches, considered alone, because I’ve found that there is often little correspondence between convergence of molecular evolutionary parameters and convergence of topologies. Perhaps this reflects some particularly troublesome datasets that I have worked with, but it does not leave me feeling encouraged. Note that I am not claiming that monitoring these parameters is unimportant, but that it is fundamentally inadequate with respect to our interest in topologies.

Finally, 3 studies used AWTY , which seems to me rather low given the potential utility of the software in diagnosing convergence failure. On the whole, the results of this survey do not encourage me. Is our research community doing enough to diagnose convergence failure in MCMC analyses? How severe is this problem? Maybe I’m making a mountain out of a molehill here based on my own experience with a few poorly-behaved datasets. But looking at the literature, it is hard to convince myself that most studies are adequately diagnosing convergence problems, and I can’t help but feel a bit unsettled by all of this.

Wednesday, May 6, 2009

Dechronization Interviews Rob Desalle

Recently, I sat down at a local watering hole with my colleague, Rob Desalle and we talked about cladistics and the changing field of evolutionary biology. Along the way, Western textiles, horror movies, and a couple of colorful expletives came up…and in the end, he told me it was the most serious conversation we’d ever had.

SP: So, let’s start with the big question: Are you a cladist?
RD: I don’t think I’ve ever been a member of the Willi Hennig Society. But, yes – in the sense that if someone said that I had to only use one method, I would choose cladistics.

SP: Why?
RD: Because I’m a minimalist. There are fewer assumptions in parsimony.

SP: But isn’t that one assumption a rather big one?
RD: Yes, but that big one is easier to track than lots of little ones.

SP: But, you do use other methods, too, right?
RD: Yes, but I don’t do that to test the results of one method against another. I do it to test assumptions. Cladists were among to the first to really examine analysis space, i.e. do sensitivity analysis, beginning with the “Navajo rugs” of Wheeler and then Gatesy’s explorations of alignment space. Then Hillis, Bull, Huelsenbeck, etc. began doing the same thing, but with likelihood. And, we’re seeing a new wave of this now with whole genome analysis. I don’t believe that one should be pluralistic – just because lots of trees agree doesn’t mean you’ve found “the truth.” But, modified pluralism is testing the effects of your assumptions. You could stack your assumptions to give you the answer that you want, but then you’d be no better than the evolutionary taxonomists of the 1950’s.

SP: But, let’s take the opposite view – what do you do if you get a lot of disagreement?
RD: It would tell me that I needed to collect more data. Unless of course I’ve already sequenced the whole genome, then I’m f$@&ed.

SP: OK, let’s talk for a minute about your role as an editor. You’ve been on the editorial board of a huge number of journals right?
RD: I don’t know if huge is the right adjective; but I have been on a few, – Evolution, Molecular Phylogenetics and Evolution, Molecular Biology and Evolution, PLoS One, Conservation Genetics, Ancient DNA, BioEssays, and Mitochondrial DNA. Being an editor is great – you get to see all these papers before anyone else does – and you get to see the field grow and change. But I don’t think it’s the job of an editor to mold the field, per se – just monitor it.

SP: So, from the standpoint of an editor, how has the field of systematic biology grown and changed over the years?
RD: We are more aware of analysis space – and if it hasn’t been looked at in a paper, that really irritates me. Students are more and more sophisticated with respect to their toolkits. In the beginning, I was using RFLP’s and running the very first version of PAUP with IBM cards. Now students are using any number of analytical approaches and a plethora of computer packages. In the old days, you had to just make a tree but now it’s so easy to collect data that you can really think about the questions. Students understand the chemistry less but the algorithms better.

SP: So, what would your advice to students be?
RD: Twenty years ago, I told all of them to learn more biochemistry. Ten years ago, I was saying that they needed to learn more statistics so that they could understand maximum likelihood and Bayesian analyses. But, today, my advice is to learn programming – Perl, R, etc. You need to be able to manipulate your data – move s%#t around on your computer. They need to understand probability, statistics, parsimony, etc, but I don’t think the philosophical ramifications are as important.

SP: You’ve gone from doing population genetics and systematics on Drosophila to being really involved with microbial genomics and phylogenetics. Do you really think a bacterial tree of life is possible?
RD: Absolutely. From some of the work we are doing in the labs here at the AMNH we are finding that reticulation does not necessarily destroy phylogeny as severely as we’ve long thought. Even with horizontal gene transfer, there is a real phylogeny. And even if HGT is widespread, you still need the scaffold. It’s the same problem as lineage sorting in metazoa. I call them the “Freddy” and “Jason” of systematics – but just remember that Freddy died in 1991 and Jason went to hell in 1993 – in other words, these problems don’t obliterate the whole signal.

SP: What are the big remaining phylogenetic questions out there?
RD: Microbial taxonomy, for sure. It’s one of the most important fields. The existing taxonomy is good, but it needs to be done about 3 orders of magnitude better. The other one I would pick would be the nematodes. Taxonomy is like, really cool and important. It’s a real science!

Rob DeSalle is a Curator of Entomology at the American Museum of Natural History. He is affiliated with the AMNH Division of Invertebrate Zoology and works at the Sackler Institute for Comparative Genomics, where he leads a group of researchers working on molecular systematics, molecular evolution, population and conservation genetics, and evolutionary genomics of a wide array of life forms ranging from viruses, bacteria, corals, and plants, to all kinds of insects, reptiles, and mammals. Rob is also Adjunct Professor at Columbia University (Department of Ecology, Evolution and Environmental Biology), Distinguished Professor in Residence at New York University (Department of Biology), Adjunct Professor at City University of New York (Subprogram in Ecology, Evolutionary Biology, and Behavior), Resource Faculty at the New York Consortium in Evolutionary Primatology, and Professor at the AMNH Richard Gilder Graduate School.

Friday, May 1, 2009

Smitten with Obama

The change in the status of science in the new White House is cause for optimism. Obama became only the fourth president to address the annual meeting of the National Academy of Sciences, where he announced the appointees to the President's Council of Advisors on Science and Technology (PCAST).

Among those are a couple of names that are well known in our circles. The former President of the Society for the Study of Evolution, Barbara Schaal was among the newly announced appointees. Eric Lander's name was previously announced.

Obama also committed to doubling the budget of the National Science Foundation in his speech (it's quite good and well-worth listening to while you're pipetting). He promised to increase the level of funding for science to 3% of the gross domestic product. This exceeds the amount invested in 1964 at the height of the Kennedy administration. He also promised that the funds will target high-risk, high-return research and support researchers at the beginning of their careers.

While all signs from the executive branch (happily) point to both steady and substantial increases in attention and funding [1,2,3,4], one has to be deeply concerned with the status of science in the legislative branch.

Geneious Comes to AMNH

Candace Toner, CEO of Biomatters and Steven Stones-Havas, a developer from Geneious came to the AMNH today and gave us a presentation on some new features of version 4.6 and a sneak preview of version 4.7, to be released soon. Some nifty new features include tools to help better visualize alignments and trees as well as the ability to link via "Green Button" to run things on the New Zealand Supercomputing Center (which, for geeks out there, was the same cluster used to render LOTR). Lots more plug-ins are planned as well. They were also really great about answering questions from our staff and students and took down some suggestions we had for slight improvements. They even bought us all pizza for lunch. Now that's customer service!