Step Back from the Species Description, Ma'am...I'm a Taxonomist

Here at AMNH, I am surrounded by drawers and drawers, bottles and bottles, and cabinets and cabinets of specimens. A fair number of these are type specimens and my colleagues have spent their careers carefully describing and depositing these and other specimens into collections. They publish these species descriptions in journals according to the rules of the International Code of Zoological Nomenclature. Bacterial taxonomists are an even stricter lot - they insist that all papers that name a new species are published in a single journal, the International Journal for Systematic and Evolutionary Bacteriology. (I have bucked that rule.) In my work on malaria parasites, I have often been met with harsh reviews when I have tried to publish anything that has a sequence, but not a matching bloodsmear. Working on parasites with multiple life stages can be particularly challenging for species descriptions - ideally one would have specimens, images, measurements, etc from each step of the life cycle..but those can be hard to obtain for many - or even most - parasites. In a recent paper, Chris Austin and I argued that incorporating DNA sequences into species descriptions can help bridge that gap.

The use of these sequences, however, should not come at the expense of traditional morphological analyses. Two recent papers "describe" new species of malaria parasites without really describing them. One was published in a the high impact journal PLoS Pathogens, and names a new species of Plasmodium in chimpanzees. Even though the authors say they examined slides of the new species under a microscope, there are no images, measurements, or discussions of morphological features that might allow a reader to visually differentiate the new species. To the contrary, they remark that the samples look a lot like P. falciparum. There is no type specimen. A subsequent paper with other samples from chimpanzees definitely hints that it may not be a new species, but may instead be another species known to infect chimps: P. reichenowi. Similarly, another paper recently reported a new species of Plasmodium in capybara. This discovery was particularly surprising given that no New World mammal malaria parasites are known, save for some in primates that are genetically indistinguishable from a human parasite. In this paper, again, there is no traditional description, there is no type specimen, there is a single image of the parasite's smallest and most challenging to identify stage, and the genetic results are of a paralogous gene represented by an unrooted cladogram.

I know that there has been a lot of discussion about electronic journals and ePubs ahead of print either work with or violate the "code", but I have to ask, has traditional taxonomy gone extinct? Is this happening in other fields, too?

Online Phylogenetics Seminar!

Erick Matsen and colleagues have organized an online phylogenetics seminar that should be of interest to many Dechronization readers. The stated goals of the seminar series are to provide a forum for the discussion of phylogenetics methodology, disseminate information about ‘best practice’ phylogenetics, and to reduce our carbon footprint by reducing air travel. This seems like a really cool idea and I’m impressed by the present and past lineup of speakers they’ve assembled. I haven’t tuned in yet, but both Marc Suchard and Ward Wheeler gave seminars in the fall – you can view both of their seminars here.

I’m looking forward to the next set of seminars, which will include species tree estimation using BEAST. Coming up this Monday, January 25 (1300 PST) is Joseph Heled, who will be talking about “Gene-tree species-tree discordance.” And Noah Rosenberg and Jens Lagergren are on deck to give seminars during February and March. One cool idea that these folks are floating is to allow seminar “attendees” to vote on upcoming seminar topics, and you can email Erick to suggest both potential speakers and topics. Check out the Phyloseminar website for instructions on connecting - seems pretty straightforward and I’ll try and give it a shot today.

Masters Programs in Systematics

Every year I find myself advising a few students interested in masters programs in systematics. Most of these students express an interest in earning a doctoral degree, but lack the research experience required to know whether a PhD program is right for them or to be competitive for the best doctoral programs. My problem as an adviser is that I just don't know about many masters programs that are appropriate for such students. Because I'm sure there are plenty of programs out there, I'm asking for your help in finding them. By having this discussion on the blog, I'm hopefully that other students and advisers will be able to benefit from the information we're able to share.

In addition to inviting comments to this post, I'd like to invite anyone involved in a masters program in systematics to submit a brief blurb about their program as a guest post on Dechronization. Just send a concise one paragraph statement to me via e-mail (rglor -at-, being sure to include information on how your students are funded and whether your program is course- or research-based.

Phylogenetic Model Selection, Sans PAUP*

Given that an increasing share of statistical software in ecology and evolution is essentially free – e.g., open source and/or non-proprietary – I used to be bothered by the lack of suitable alternatives to PAUP* (which requires a licensing fee) for certain phylogenetic applications. Foremost among these is perhaps the ability to perform statistically rigorous phylogenetic model selection. There are now a number of free alternatives for phylogenetic model selection, that do not require PAUP* (which is required by the widely used Modeltest and MrModeltest programs ). I've probably been living in a bubble, because I just learned of several of these yesterday, but I thought I'd flag a few for Dechronization readers who might find this info useful.

One that I have used extensively is ModelGenerator , from the nice folks at the NUI Maynooth Bioinformatics Group. This is quite useful, not least because it has a web interface that lets you upload batches of fasta-formatted alignments. Because the computation is distributed across many “idle” desktop computers at NUI Maynooth, the processing time is low – I’ve uploaded batches of alignments only to get a results file emailed back to me within ~20 minutes or so. You can also run the program locally on your own machine or email the author for source. Additional options that I have yet to explore include jModeltest , FindModel (another web-based tool), and MrAIC . I'm sure this list is incomplete and welcome comments on programs I've missed as well as strengths and weaknesses of those I've listed.

"El terremoto" (the earthquake), as experienced in Santo Domingo

Kudos to Rich for helping to point out the various ways in which individuals can contribute to the recovery effort in Haiti, which most people know by now was devastated by an earthquake two days ago. I don't have much to add to his post, but since I happened to have been in Santo Domingo in the Dominican Republic at the time, I wanted to relate my experience.
A few people know that I was in the D. R. this past week helping Luke Mahler, along with Bryan Falk and Jose Luis Herrera, to collect several large series of anoles for an ongoing collaborative project between Jonathan Losos and Butch Brodie on the evolution of G and P matrices in Anolis lizards of the Caribbean. While Luke and Jose Luis were out dealing with the Dominican permitting authorities, Bryan and I spent all of Tuesday (our penultimate full day on the island) preparing the last of our large specimens series. This was to ensure that the specimens had at least 24 hours of "fixing time" before they'd need to be repackaged for transportation back the MCZ at Harvard. When the earthquake struck, Bryan and I had just been rejoined by Luke and we were all sitting around the kitchen table (aka., makeshift lab bench).
To both Bryan and I alike, the sensation of the earthquake was very strange. Since the notion of an earthquake seems totally preposterous to anyone who hasn't experienced one before (myself included), my first reaction was that to think that I was just a little dizzy (perhaps from hours spent bent over trays of 95% ethanol). The whole world seemed to be swaying back and forth in front of me and I felt lightheaded. But at the moment that Bryan (who had been thinking the same thing) and I made eye contact, it was clear that we were not shaking - the world was. The tremor seemed to last about 10 seconds or so, during which time I got up and walked to the window to try and figure out if the it was the building that was shaking, or the tree next to it (undoubtedly both were). Naturally, our reaction was something to the effect of "holy $%!*, was that an earthquake?" Luke said something like "that was crazy - you guys need to remind me about this later so we can check and see if it made the news." Someone might have also said - "Liam, you should write a Dechronization post about this." We then proceeded to debate what numerical on the Richter scale the earthquake might of been worth, and so forth (although neither Luke nor I had ever experienced one before), as we continued fixing lizards. At the time, we hadn't the slightest inkling of the devastation that had been wreaked in Port-au-Prince by the same tremors that we were discussing so casually over the dissecting tray.
So, I second Rich's suggestion that we all try and find ways (large or small) to help alleviate the pressure on Haiti that has been wrought by this latest disaster. It also seems clear that the devastation of these natural disasters is massively exacerbated by the generally low level of economic development in the region. Imagine 40,000-50,000 deaths from a natural disaster in the United States? It's inconceivable.
Since I have never been to Haiti, the picture above is one snapped by Bryan of a Haitian boy in the mountains near Polo, in the Barahona Province of the the Dominican Republic.

Haiti Getting Kicked While Down (Again)

My students were on a mountain in the Dominican Republic close to the Haitian border when the 7.0 magnitude quake struck near Port-au-Prince. They described watching the horizon shake in front of them for a solid 10-15 seconds. Although estimates for the loss of life resulting from this brief episode remain unavailable, they are sure to reach into the tens of thousands.

Last summer, I blogged about field work in Haiti (1, 2, 3, 4). At that time, Haiti seemed like a country in shambles; its denuded earth was washing out to sea before our eyes, roads between major cities were barely passable, electricity was absent even in towns as large as 80,000 people, and clean water was sometimes impossible to come by. Nevertheless, a glimmer of hope accompanied the sense that things were better than they'd been in years. Whatever fragile progress Haiti may have made over the past few years, however, has just crumbled to the ground.

This quake, of course, is just another in a long string of injustices and misfortunes for Haiti and its people. Please consider making a small donation to one of the many aid organizations that are mobilizing to help Haiti through its latest (and perhaps greatest) crisis. Possibilities include Doctors without Borders and UNICEF. Donations are especially encouraged from countries like the United States or France, whose governments that have spent the better part of the last 200 years doing little but destabilizing Haiti.

Music Video Tribute to Dolph Schluter

The seemingly increasing popularity of Darwin Day video parodies (and originals) made it pretty much a sealed deal that someone had to post one on YouTube that would be broadly appeal within evolutionary biology circles. With the Lonely Island's single "I'm on a Boat" turning gold last summer, and UBC's evolution group being filled with creative smarties, I was barely surprised to see an excellent parody of a parody, "I'm in a Pond," appear last month. You can turn on closed captioning (CC) for clarified lyrics, which include, "I'm writing Nature papers/And you're stuck at Am Nat," and "Believe me when I say/I love E-d-a."

(I would like to encourage everyone to send in their Darwin Day videos. You can just post links in comments. We'll highlight the best ones on February 12th.)

Parasite of the Day

Hoping that parasites do not get left out of the whole "International Year of Biodiversity" thing, I have started a new site, "Parasite of the Day". I'm appealing to my parasitology colleagues to contribute - and to anyone with a favorite parasite to nominate it. Today's parasite is Plasmodium minuoviride, a cool lizard malaria that Chris Austin and I described last year. Go to the new parasite-of-the-day site to find out why.

Priors and convergence in BEST

Over the past year, I’ve heard a bit of grumbling about how difficult it is to achieve convergence using the program BEST ( Bayesian Estimation of Species Trees ). I was recently using BEST to analyze a reasonable-size multilocus dataset (20 taxa, 5 loci), but things were looking grim: convergence was not happening, and the trees looked nothing like concatenated and single locus trees estimated using RAxML. Then I came across this nice paper by Adam Leache , where he demonstrates a strong effect of priors on convergence in BEST. BEST requires setting two priors that are specific to the hierarchical species tree model: a prior on species population sizes (theta; thetapr), and a prior on the relative gene mutation rates (GeneMuPr). Leache showed that by increasing the mean of thetapr, he was able to obtain much faster convergence (inset).

Inspired, I set up a series of runs with my data where I increased the mean of thetapr – up to shape=3 and scale = 0.1 [for a mean of scale / (shape – 1)]. I found that performance improved immediately and dramatically: I was achieving significantly higher log-likelihoods within 200K gens of sampling than I had previously found in tens of millions of generations. After just 30m generations – a single night’s worth of sampling – my analyses pass a battery of convergence tests, including AWTY-based sliding window analyses of the actual species tree sample. While I’ll run this out for a few more days to see what happens, the results are pretty encouraging.

As an aside, I found that the default prior on gene mutation rates (uniform on 0.5 – 1.5) is not adequate if you have substantial among-gene heterogeneity in rates. I had mtDNA mixed with nuclear loci in my analysis, and the mtDNA rates were far too high for the defaults: the estimated mtDNA rates were simply piling up on the upper bound of the prior distribution (1.5). Because the mean rate across all loci is 1, the ratio of the bounds of this distribution represent the theoretical maximum relative rate difference that you will allow to occur within your dataset. If you have K loci, the theoretical maximum value this can take is K (which, if observed, would require that you have K-1 loci with relative rates approaching zero). So, I suggest using a uniform (0, K) prior on this parameter – it is a uniform distribution, so using an overly broad range isn’t likely to have any pathological consequences for your analyses – and this seems much better than the defaults, which allows at most a 3-fold difference in mutation rates among loci (eg, upper = 1.5 divided by lower = 0.5).

Blogging SICB 2010

The beginning of January is marked annually by the Society of Integrative Biology (SICB) meeting. This year's was in foggy Seattle. I had planned to post a few blog posts during the meeting (as Rich did for last year's 'Evolution' meeting, e.g., here; or as I did more successfully at the Anolis Symposium). Life is full of good intentions, however, and here I am posting about SICB for the first time as I wait for my delayed flight out of Seattle. A confluence of factors contributed to this negligence, not the least of which was the fact that I wasn't scheduled to speak until this morning. The conference hotel was also charging (almost unbelievable) $11/day for internet access, which somewhat limited my ability to pop open the laptop and write a quick blog post. (I actually pay $15/month for nationwide broadband access through my phone, so I was actually not as hampered as others might have been - but tethering the phone and dialing up to the Verizon network is much slower and less convenient that hopping quickly on a wi-fi network would have been.)
Costly internet access aside, this was a great meeting. For people who have not attended SICB in the past, the composition and interests of presenters and posters is much more widely varied than in the major summer meetings. For example, concurrent sessions this morning included a session on "Spiralian Development," another on the "Mechanics of Defensive Structures," a third on "Sexual Selection," and a fourth (my session, actually) on "Predation and Predator Avoidance." With such as eclectic assemblage of sessions, it was pretty easy to identify those to avoid (for example, I did not attend the session on "Neurobiology - Molecular Neurobiology & Neuroanatomy" - no offense to neurobiologists). The meeting by no means lacked for interesting talks. For example, Bob Cox from Dartmouth College gave a fascinating talk on the survival costs of reproduction in Anolis, Katrina McGuigan gave a really interesting talk on the quantitative genetics of intraspecific allometry, and Eduardo Rosa-Molinar gave a fantastically illustrated talk on the neurological basis of reproductive behavior in Gambusia fishes. The latter talk featured both impressive high speed video of mosquitofish copulation, as well as wild three dimensional imagery of associated neural circuitry.
My talk was on the ultimate day of the meeting, which almost felt like it was after the meeting had already ended - since the last day was a half-day of talks and since there had been a concluding reception the previous evening. I talked about using mathematical and computer models to draw inferences about predation regime from the rate and pattern of tail autotomy in several species of Puerto Rican anoles. This project actually arose out of a collaborative venture with a great Harvard undergraduate, Karen Lovely, and perennial "Dechronization" third wheel Luke Mahler. My talk was about as well attended as one could hope for on the last (half) day of the meeting at 8:20am in the morning - but this is a really neat project, so I hope that when our in press article comes out in Evolutionary Ecology Research later this month, a few of the people that read this blog or happened to see my talk will check it out!

Welcome to Our World

Many of you are probably frantically finishing up grant proposals that are due at the NSF this week. Yet, across the river in Bethesda, some changes in the NIH grant proposal length limit are causing a stir. Beginning with the February 5th deadline, the page limit will drop from 25 to 12...and it appears many grant-writers are a little unhappy. This week's Nature News has a little blurb talking about the new format and the reaction it's getting. The featured quote in that piece says, "In the past I would have easily put in at least ten figures. That's impossible now." I bet that all of you trimming references, adjusting line spacing, and resizing figures for your 15-page NSF proposal are salivating at even the thought of having room for 10 figures. Nevertheless, some folks are fine with the new short format, with the expectation that it will force greater clarity. Needless to say, reviewers are also likely to be relieved by the changes. Having recently served on an NIH study section (which in not done confidentially, as it is at NSF), I can say that being faced with a stack of 100-page proposals, with 25 pages of narrative on the project, was a little exhausting. But not nearly as exhausting as writing one myself. Good luck getting those in, everybody.

That Darn LBA

Ah...I remember the days. Being a young grad student, trying to wrap my head around systematic biology and finding myself immersed, and sometimes confused, about the debates going on in the literature - and in seminar rooms - over parsimony vs. maximum likelihood. One favored topic of discussion was susceptibility to long-branch attraction (LBA). In my case, I went so far as to organize a graduate seminar that involved reading lots of papers and dragging John Huelsenbeck and Mark Siddall up to Burlington in the dead of winter to try to set us straight. I don't know about the rest of my cohort, but I finished the semester thinking that the only reasonable solution was to do my best to sample enough taxa to disrupt LBA as much as possible. And then, much of the controversy died down for a while. Part of this, I speculate, was due to the development and growth in popularity and theory of using Bayesian inference in phylogenetic analyses. BI was thought to have an advantage over ML in that it could incorporate uncertainty over the "nuisance parameters" in an analysis. A recent paper in PLoS One by Bryan Kolaczkowski and Joe Thornton, however, has raised the ugly head of LBA again. In this paper, Kolaczkowski and Thornton presented convincing data that BI is very susceptible to inconsistency and bias, particularly in cases of LBA (the "Felsenstein zone") - and that these problems are exacerbated when the amount of sequence data increased, with the posterior probability support values for incorrect clades converging to 1.0. Kolaczkowski and Thornton explored these effects with classical four-taxon trees, with real, known-to-be-problematic datasets (the troublesome Encephalitozoon), and other datasets with prescripted heterotachy and other heterogenous parameters in the evolutionary model. Importantly, they contend that "more sophisticated MCMC algorithms and more complex priors" cannot alleviate the bias that BI shows. The blossoming field of phylogenomics and the desire to incorporate larger and larger matrices into our systematic analyses, may thus lead us to produce well-supported but false trees if BI is used, if our datasets contain instances of LBA - and really, whose don't? This was a good read with some very important implications. I'm anxious to hear what others think of it.

Fun with new open access journals...

Science is an incremental process, and the foundation upon which new results and discoveries are built is composed largely of previous research. It almost goes without saying that the long-term stability and accessibility of such previous research is fundamental to what we do as scientists. This is why accurate, long-term archiving of published, peer-reviewed research is such a big deal. The recent rise of Open Access journals has raised many questions about how we should deal with long-term archiving.

Accurate archiving of peer-reviewed research means that, once a paper has been published, its identifying attributes - page numbers, volume, etc - should not be changed. Otherwise, this creates a duplicity in the literature and makes it difficult to track down potentially important pieces of information. This is why journals are (in my experience, anyway) uncompromising on any further changes once a paper has been officially published.

I was thus surprised to find that a paper I published in 2006 has had, at some point, a change in page numbers. The journal in question is Evolutionary Bioinformatics Online, published by Libertas Academica, which has been the subject of some prior discussion here on Dechronization (see this previous post ). The original version was 2006:257-260, now shifted to 2006:247-250 and any attempts to find the 257-260 version on the Evol Bioinfo website will fail - those original page numbers are now part of a different article.

This is - at most - a mild annoyance. Still, it is pretty difficult to track how people are using the software I described in that note, because ISI does not have a record of the article using original pagination - which is what people (including myself) generally cite. So - citing the 257-260 version effectively falls into a black hole. But this does raise some worrisome questions about the long-term management of information, especially if this is not an isolated incident. Has anyone had similar experiences with LA or other open access publishers (or non-OA publishers, for that matter)?

Dechronization: A Review of 2009

Happy New Year everyone! Hope that everyone had a nice holiday time. Thought I'd post a little "wrap-up" of some of the highlights of the year on our blog - at least as I see them. Feel free to chime in!
Dechronization turned 1 in 2009 - that was exciting - and we added two new authors: Dan Rabosky and Liam Revell. We started our interview series, kicking off with Joe Felsenstein, Jack Sullivan, and Rob Desalle. More to come - who would you like to see interviewed in 2010? There was a lot of Geneious love (and a few snarky accusations that we might be getting paid off by them - we assure you we're not!), a lively discussion of the swine flu (sorry, H1N1) epidemic - something that got the attention of Medpedia and resulted in our blog being showcased on their site. There were lots of good discussions about Bayesian methods - when they fail and when we fail them (twice). We had some fun with the infamous "Snake with a Foot" photo and a few posts poking at the bizarre PNAS paper touting that "forbidden love" between butterflies and onychophorans had resulted in caterpillar love children. There were posts from the Evolution meetings in Moscow, Idaho in June as well as the Anolis Symposium at Harvard in October and Rich gave us glimpses into some exciting fieldwork in Haiti. Thanks to everyone (our stats counter says 19,970 returning readers for 2009!) for reading our blog. Looking forward to 2010!