Monday, June 30, 2008

Foreign Dispatches: Reporting from the ASP Meeting in Texas

For my inaugural post, I'm relaying some info from the 83rd meeting of the American Society of Parasitologists, held this year in Arlington, Texas. Taxonomy and systematics have always been a large component of our society, though this year there seemed to be slightly fewer talks in these categories. Following the posts from the Evolution meetings, I have to say that parasitologists are still a little behind the curve. Most systematic studies being done were using the typical genes - 18S, coI - and doing concatenated analysis with parsimony plus one other method - sometimes Bayesian, sometimes ML. But, we have to be cut a little slack, I think. Parasites are notoriously difficult to do molecular biology on - often there's not much material to work with so you have limited ability to troubleshoot primers or methods. And sure, there are some genomes available for the parasites that infect humans, but often there's just the one and there is a lot of divergence in some of these groups. Nonetheless, progress is clearly happening and there are some cool things being done. Pete Olson, from the Natural History Museum in London, presented some really cool work exploring the expression of hox genes in tapeworms, animals that have a very different kind of segmentation. Janine Caira (UConn) and Kirsten Jensen (Kansas) and company once again dazzled us with how poorly the parasite fauna of the world is known, by showing how their field work in Borneo has revealed not only dozens of new species of tapeworms, but sometimes new species of hosts as well. Jessie Light (Florida) made us scratch with her work on the taxonomy and phylogeny of human lice, and Mark Siddall (AMNH) hinted that his recent work on EST's in leeches and their insight into paralogy may eventually explain some of the really bizarre findings of Dunn et al. in their recent animal phylogeny. Stay tuned.

Sunday, June 29, 2008

A Boys' Club No More

It's my pleasure to welcome a new author to this blog -- one whose work might actually be directly beneficially to human society. Dr. Susan Perkins is an expert on the systematics and taxonomy of Plasmodium, the nasty little parasite the causes Malaria. She's based at the American Museum of Natural History where she's currently Assistant Curator in the Division of Invertebrate Zoology and a member of the Sackler Institute for Comparative Genomics. In addition to her expertise in invertebrate systematics, parasite biology, and comparative genomics, Susan brings much potential for insight into the fascinating minds of those rare creatures known as the capital 'C' Cladists.

Software Review: BEST version 2.0

As Glor hinted in a recent post, "species tree" analyses are pushing the phylogenetics world towards a paradigm shift. One of the methods currently available to researchers is Liang Liu's computer program BEST (Bayesian Estimation of Species Trees). Version 2.0 is available at the BEST website in both Windows and Max OSX executables. BEST estimates the posterior distribution of species trees that are estimated from multilocus, and multiple-allele DNA sequence data that attempts to account for the persistent pattern of deep coalescence of alleles. This is one of the mechanisms that can result in mismatch between gene trees and the species tree. Details of the method are provided in a paper by Liu and Pearl published in Systematic Biology (subscription required for PDF download).

The BEST website provides example files for analyses when single alleles are sampled for each species, as well as the sampling of multiple alleles from each species. Both analyses assume that species are reciprocally monophyletic. Given that BEST is a modification of MrBayes, the data formats are very similar except that BEST includes priors for theta and mu. Also, the tree topologies, branch lengths, and mu are unlinked across the sampled loci. If a haploid locus (mitochondrial or chloroplast DNA) is sampled, BEST allows the user to define the ploidy of the locus (default setting is diploid).

If a multiple locus dataset is run in BEST, a sham file is needed to summarize the trees after the burnin (the familiar "sumt" command from MrBayes). Liang Liu has posted an example of this type of file on the BEST website.

The trees are summarized using a burnin value that discards all trees and parameter values sampled prior to convergence. As in MrBayes, summarizing the trees produces a consensus tree file, where the consensus percentages for clades are interpreted as the Bayesian posterior probability. Progress of the BEST run and assessment of convergence can be monitored using the computer program Tracer.

My laboratory group has been experimenting with BEST for the past few months, and we are generating some interesting and exciting results. The prior on theta appears to be the one issue/nuisance that we have run across in our explorations using BEST. A fairly wide prior is given in the example files. We are beginning to run BEST with more narrow, and realistic, priors for theta. So far the results are promising.

Overall, I have found BEST straightforward to implement with my multilocus phylogenetic data. Familiarity with MrBayes will certainly help new users of BEST. Also, Liang Liu has been very helpful and encouraging to users, and has implemented suggestions into the example files on the BEST website. My entire lab group is excited to be exploring the frontier of phylogenetics, with the hope of that we are making the most reasonable inferences regarding species relationships that is afforded by our hard earned data.

One More Reason to Post Free PDFs of Your Pubs

My effort to use this weekend to catch up on the latest research in journals like Molecular Ecology, Ethology, Biological Journal of the Linnean Society has been greeted with endless frustration. The incompetent blood-suckers at Wiley-Blackwell have decided to take all of their journals off-line for two days. Un-fucking-believable. Seriously people, let's take science back: get out there and post all your publications as free PDFs. Maybe that way people will be able to actually read them.

Saturday, June 28, 2008

Phylogenetics Grant from the Discovery Institute?

You'd think, given all their problems explaining the history of the horse species we know existed, that opponents of evolution would be loath to add another species to the clade. Not so! The scholars at Answers in Genesis want you to understand - with no uncertainty - that unicorns are real. Now it's our job to complete the taxon sampling required to solve this interesting phylogenetic puzzle. They concede that it may also have been a relative of the cows, so best to sample broadly. (Image cribbed from Weinstock et al. 2005. Evolution, systematics, and phylogeography of Pleistocene horses in the New World: a molecular perspective. PLoS 3:1373-1379)

Gecko Porn

Well folks, our initiative to incorporate porn in the blog has already payed off with at least three visits from perverted Google users. This week I offer a stunning photo of gecko porn discovered on Flickr by my lab's gecko guru, Daniel Scantlebury. Dan tells me these are Ptyodactylus ragazzii from the family Ptyodactylidae Phyllodactylidae. Now that Verne Troyer has expanded the web of celebrity sex tape scandals, I can only hope that the GEICO gecko will not feel tempted to produce one as well.

Friday, June 27, 2008

Highlights from Evolution 2008: Part II

There were a few notable advances in comparative methods. Emma Goldberg and our own Boris Igic illustrated how several recent rejections of Dollo's law may have resulted from a failure to consider differential rates of species diversification and several other violations of the Mk2 model (see our related previous post on Maddison et al.'s work and the BiSSE method). 

If lineages with the character state A are more likely to speciate, or less likely to go extinct, than lineages with character state B, there will be a tendency to infer an ancestor with state A. Emma and Boris illustrate the practical implications of this sort of differential diversification by showing how it may have resulted in the incorrect inference of re-evolution of wings in stick insects (Whiting et al. 2003; in Nature).  If there is any truth to the widespread belief that vagility is inversely associated with speciation rate, it seems logical to suggest that the non-winged lineages have undergone more species diversification and biased the conclusions of standard methods toward the reconstruction of a non-winged ancestor and repeated re-evolution of wings. 

For those of you who missed the talk, Boris was sporting a mustache to symbolize his allegiance with Dollo.

Thursday, June 26, 2008

Tex and LaTeX: Dork It Up!

So, you think you're a science dork, eh? In my book, you don't get to really wear the dork crown until you write all of your papers in LaTeX. LaTeX is a language for creating documents with TeX typesetting; I don't really know what that means, but I do know it makes beautifully formatted pdf documents. There's also a good free mac implementation of LaTeX called TeXShop.

Here's the caveat: it does take a bit of effort to learn. It has a bit more in common with writing computer programs than it does with MS Word, for example. If you've ever edited html code, it's sort of like that. But the effort is time well spent. Here are the main things I like about using LaTeX.

1. Good bibliography management with BibDesk, and automatic citations and bibliography generation. I like this system better than EndNote because it's free, and it doesn't crash or do unspeakable secret things to your document.

2. Ever tried to get a figure in the right place in Word? This process makes me want to stick flaming skewers in my eye. With LaTeX, you put a reference to a figure in the document, and the program figures out a logical place to put it.

3. Equations. Word's equation editor has gotten better, and LaTeX requires some learning of syntax, but once you get it, it works beautifully. All of my math geek friends use LaTeX all the time.

4. Readability. LaTeX documents are easier to read than Word documents.

5. Integration with r through sweave. You can even make documents where figures and results are generated on the fly from your data when the file is processed - so if your data changes, the paper is updated automatically.

5. Reign over other dorks. Being good at LaTeX is the computer equivalent of wearing a Tron costume and speaking klingon (the warriors tongue). AT THE SAME TIME.

Wednesday, June 25, 2008

Highlights from Evolution 2008: Part I

First some highlights from talks on phylogenetic reconstruction in general:

1. The species trees have arrived.

2. 5 to 20 'independent' loci analyzed via partitioned analyses in MrBayes or RaxML are the standard for high-end phylogenetic analyses (of non-model groups). Most people were concatenating, but that may be changing (see point 1).

3. Branch lengths are a growing concern in Bayesian phylogenetic analyses. Both Joseph W. Brown (Michigan) & Jeremy M. Brown (UT Austin) showed that our prior assumptions about branch lengths require revision. Joseph Brown gave a nice example of how shifting to more appropriate (i.e., better fitting) assumptions about branch lengths can turn bad trees to good in the case of paleognathus birds.

Tuesday, June 24, 2008

A New Paradigm? Species Trees From Gene Trees

Lots of talk about a 'new paradigm' today at a symposium on generating species trees from gene trees at the Minnesota evolution meetings. If most of the speakers in this symposium have their way, the days of generating individual gene trees or trees from concatenated datasets will soon be in the past. Talk of such an advance dates back more than a decade, but things have moved very quickly over the past two years. This seems due in large part to the emergence of the software package BEST by Liang Liu. I've received mixed reports about BEST's accessibility and limitations. As one might expect from a new package it's a bit buggy and may be a bit less user-friendly than you'd like, but is working for most people. Liang Liu said during his presentation that version 2.0 posted on June 18, 2008 is considerably better than the previous iteration, so you'd be well-served to upgrade if you've already been messing with it. More generally, there are some potentially significant assumptions of the existing methods for generating species trees from gene trees (e.g., horizontal gene transfer) whose violations are not well understood. In any case, you'd better move BEST to the top fo the list of programs you need to learn...

Thursday, June 19, 2008

See You in the Twin Cities?

We're just one day away from Evolution 2008! I don't know about you, but I'm at least half way finished analyzing the data I plan to present...

Igic and I are sharing a room and look forward to hosting late-night conversations about the latest phylogenetic methods over strong beverages.

Be prepared to enter a mac-free environment. According to the message that was just sent to presenters, we'll be forced to enter the wonderful world of Microsoft by running our presentations on Windows 2003 machines.

Wednesday, June 18, 2008

A Lab Milestone

I'm sending out congratulations to my firsts PhD student - Julienne Ng - on successfully passing her quals! I felt I needed to do something momentous for the occasion, so I baked my first cake (with supervision from my lovely baking coach). So that others would remember it for something other than its poor taste, I crafted it into the shape of an Anolis distichus head (one of the species Julienne is studying for her thesis). I realize it looks more like a green komodo dragon with an infected tracheotomy, but you need to walk before you can run. Julienne, I don't want to ruin the surprise, but I'm already planning a three-dimensional cake-based sculpture of two fighting anoles for your defense.

Evolution hits the big-time: Spore

You know your field has hit the big-time when they make it into a video game.  In the upcoming game from Maxis, Spore, you control evolution:


I'm not really sure how this is going to work, but it appears that you will be able to guide the evolution of a creature from the primordial ooze, onto land, and eventually into space.  There's more information on the wiki page for Spore.

It seems that every time evolution trickles into popular culture, it's basically the same story of progress up the ladder of life, from microbes to fish to lizards to man.  One of the goals of Spore is to crawl out of the water onto land; you can spend your "DNA points" to do so. Entirely nonphylogenetic, and I just out-dorked the videogame dorks.

Still, biodiversity prevails: before the release of the game, the company has released a creature creator that allows people to create critters of their own.  Currently more than two creatures per second are being created.  The maximum estimate I've come across for the total number of species on Earth is 100 million (but we have no idea). At this rate, it will take about two years for Spore to catch up with Earth.  But who in their right mind would create the Aye-aye? Or the ocean sunfish?  And, given enough time, will Virginia opposums evolve civilizations and space travel?  Only time will tell.

Tuesday, June 17, 2008

Blast Tree View

While putzing around NCBI, I just noticed that Blast tools now include something called Blast Tree View, which "features new distance measures, tree downloading, re-rooting, simplification and sequence grouping." This could either be a nice way to quickly visually assess quirks in your search results or the beginning of an avalanche of disastrously unprofessional figures in prominent journals. I won't venture to guess which way it will go. 

Monday, June 16, 2008

Need Markers? Come 'n Get'em!

It wasn't long ago that molecular phylogenetic studies of animals relied almost exclusively on a single molecular marker: mitochondrial DNA. Graduating with my PhD in 2004 I was part of the last generation of scientists to get away with this. Thankfully, the biggest obstacle to multilocus studies (i.e., the availability of PCR primers for non-model organisms) is becoming a thing of the past with the availability of genomic sequence data. Two recent studies contribute hundreds of new PCR primers for potentially phylogenetically informative regions of reptiles (including the feathered variety).

Townsend et al. (2008; Rapid development of multiple nuclear loci for phylogenetic analysis using genomic resources: An example from squamate reptiles. Molecular Phylogenetics and Evolution [subscription required] 47:129-142.) use bioinformatic analysis of Fugu, Homo, and Gallus genomes to develop primers for amplification of long, continuous exonic sequences in squamate reptiles (although they also discuss the potential broader taxonomic utility of their markers). The provide >80 loci, but only offer preliminary sequence data to ascertain phylogenetic informativeness for 25 of these. The results were encouraging, the new loci seemed to support previously established relationships; many exhibited considerably greater variation than the well-used RAG1 that seems to perform particularly well among the suite of "stock" loci that have been used rather intensely over the past few years. Their utility is further borne out by Wiens et al.'s new analysis of snake relationships in the latest issue of Systematic Biology (subscription required). As an aside, I couldn't help but notice that Townsend et al. use Invitrogen's VectorNTI software for primer development. This software is supposedly available for free to academic uses, but I've had a hell of time getting a license. I tried writing and calling a few months ago, but had difficulty getting an authorized license to work and have never gotten around to using the program. Anybody else have this problems (or a solution?).

Backström et al. (2008; Genomics of natural bird populations: a gene-based set of reference markers evenly spread across the avian genome [subscription required]. Molecular Ecology 17:964-980.) identify and sequence an even larger, and more diverse, set of loci by comparing the complete genome sequence of the chicken to ESTs and trace files from the zebra finch genome. Because they're more interested in asking questions at the population level, they test the utility of their markers for detecting intraspecific/phylogeographic variation and note their discovery of over 800 SNPs across several unrelated individuals of collared flycatchers. Unlike the Townsend study, indels are included in Backström et al.'s samples. They were also able to space their markers throughout the genome. Scott Edwards does a nice job of putting this advance in perspective.

Friday, June 13, 2008

Hot Off the Press: Evolution 52(6)

Another slow month for comparative methods. There are, however, at least five phylogeographic studies to check out (involving admiral butterflies, salamanders, ducks, and freshwater fish). Most involved fairly standard combinations of mitochondrial and nuclear markers analyzed with the usual suite of methods (e.g., trees, haplotype networks, AMOVA, coalescent methods). I was a bit surprised to see the salamander study by Bos et al. relying so heavily on mismatch distributions, I thought the ambiguity involved in intepreting such figures had made them a thing of the past. Perhaps even more surprisingly, Burridge et al. managed to sneak in a mtDNA-only analysis!

Wednesday, June 11, 2008

Evolution 2009: pack your skis?















If anyone out there is planning on attending the Evolution Meetings in 2009 here in Moscow, ID, here's a sneak preview of the inland northwest. If you can't tell, that's snow falling in my backyard yesterday (June 10). I don't think this is typical, so don't pack your sleds just yet.

We're all really looking forward to hosting next year, it should be a blast.

Tuesday, June 10, 2008

Software Review: Geneious Pro Bioinformatics Suite

Are you looking for a fully featured program for the examination and analysis of DNA sequence data? Are you hoping for a way to avoid paying thousands of dollars for those über-annoying USB keys from Sequencher? The answer has arrived and it's name is Geneious.

Motivated by my desire to never purchase a single Sequencher key, I stumbled across this program when I was starting my lab a year or so ago. It was a revelation. Geneious' beautiful graphical user interface offered all of the Sequencher utilities I was looking to replace (e.g., viewing and editing of electropherograms, generation of contigs from individual sequencing runs). The only important feature that was missing when I first used Geneious was a function to identify potentially heterozygous positions, but, to my great surprise and delight, the Geneious developers added this feature within just a few days of my requesting it. This may be Geneious' greatest asset: a group of young developers who seem eager to respond to user requests and proactively expand their software's scope. Evolutionary biologists are particular lucky that Geneious' inventor - Alexei Drummond - is one of us.

As far as phylogenetic applications go, Geneious absolutely blows Sequencher out of the water. First, Geneious has some awesome features for visualizing aligned datasets (see figure). It also has a growing suite of built-in tree building algorithms. If you own a copy of the command line version of PAUP, you can use Geneious to implement analyses using PAUP's algorithms for Maximum Likelihood, Parsimony, and Neighbor-joining. Geneious will even run Modeltest for you before ML or NJ analyses (although it doesn't yet tell you which model it actually chooses, this is supposedly being fixed for the next release). Although users are permitted essentially limitless flexibility by inputting their own PAUP command blocks for these analyses, the same flexibility is not, unfortunately, provided in Geneious' application of MrBayes. At the present time, this application is limited to unpartitioned analyses with default parameter settings for variables such as temperature of the heated chains. Hopefully this feature will be upgraded in the near future.

If I have one remaining complaint about Geneious it is that it run frustratingly slow at times. Perhaps this is just an unavoidable outcome of trying to display so much data. Nevertheless, I've been running Geneious on fairly new OSX and WindowsXP machines (all with >2 GHz dual core processors and >2 GB of memory) and I'm not sure I'd try it with anything less.

In the final analysis, Geneious is not only an order of magnitude cheaper than Sequencher, it's also a better program. (Full disclosure: For the last few months, I've been a beta tester for Geneious and have been provided with one free license as a result. Having said this, I fell in love with this program long before receiving this perk and have put my money where my mouth is by purchasing two licenses for my lab.)

God Is the Architect of Rapid Species Diversification!

Ever since visiting the creation museum on my way home from last year's herpetology meetings I've been on the "Answers In Genesis" mailing list spearheaded by the museum's mastermind Ken Ham. The latest mailing makes all the junk mail about creation DVDs worth it. In it, we are introduced to the field of "Baraminology", which involves the identification of the created kinds that traveled with Noah on his ark. Apparently most of these kinds "have diversified so that today they are typically represented by a whole family" (apparently God is a Linnean, if this doesn't convince the rank-free taxonomists to throw in the towel I'm not sure what will). It gets better when the author goes on to say that the presence of so many distinct forms from the original kinds over a relatively short period of time suggests that "diversification occurred very rapidly." Apparently they're up to speed on the latest literature. The evidence cited is Herrell et al.'s recent study of rapid evolution in lizards. The accompanying article is actually quite well-written and picks up on the major deficiencies of the study (e.g., the reliance on mtDNA alone to infer population history and the absence of convincing evidence that the observed changes are genetic [something the paper's author suggest without supporting]). Maybe Herrell et al. can get a grant to do follow-up work from the Discovery Institute?

Saturday, June 7, 2008

I Smell Something Fishy

Just a note to welcome our newest contributor, Prof. Thomas Near of Yale University. Tom is the world's authority on darter fish and a respected innovator and expert in the area of dating molecular phylogenetic trees. Tragically, he's also a die-hard Cubs fan.

Turtle Porn

We've been joking about weekly porn postings to elicit more Google hits. Turns out that this works: some pervert from Germany actually arrived at our site last week after conducting a google search for 'invertebrate porn.' Congratulations you sick fuck, you just made my week. Given this success, I offer the following shot of snapping turtles in the act. This one is in honor of my new pet snapping turtle, a hatchling named Goldschmidt (anybody care to guess what motivated me to give the little beast this name?). One of my graduate students found baby Goldschmidt wandering near the Gennessee River not far from campus. Since snappers usually don't even start nesting in NY until June I'm assuming this little bugger over-wintered in the nest.

Friday, June 6, 2008

The Rich Get Richer

The world's most species-rich amniote genus has just welcomed a new member: Anolis cuscoensis (Poe et al. 2008. Journal of Herpetology 42:251-259). This beautiful green animal from Peru is a member of an ancient anole lineage with representatives in both mainland Central and South America and the Lesser Antilles (the lineage was prematurely assigned to the genus Dactyloa by Guyer and Savage). This lineage is only distantly related to the species that comprise the species-rich Greater Antillean fauna. Steve Poe has picked up the torch from his undergraduate mentor - the famous anole guru Ernest Williams - by conducting much needed alpha taxomic on the relatively poorly know South American anole fauna. I just wish brother Poe would have put together the type of phylogenetic analysis that would have been deserving of his considerable talents. There is a tree showing the position of the new species, but it lacks any indication of support (it notes only that all nodes are supported by bootstrap values >50). Moreover, they state that the tree is based on 1,666 parsimony informative characters, but they gathered data on 81 morphological traits from the new species: it seems, therefore, that molecular data making up the majority of the dataset is still missing for Anolis cuscoensis. Oh well. Nit-picking aside, this paper goes much further than most species descriptions. Go team!

Wednesday, June 4, 2008

Over- Vs. Underpartitioning in Bayesian Phylogenetics

Although it was published nearly a year ago, I'm still learning from Brown and Lemmon's important contribution on partitioning Bayesian phylogenetic analyses (The importance of data partitioning and the utility of bayes factors in Bayesian phylogenetics. Systematic Biology 56:643-655). One of the things they're most likely to be cited for is the conclusion that under-parameterization is considerably more problematic than over-parameterization. This result is evident in their Fig. 4, which uses the same type of visualization implemented by AWTY. The dots in this figure indicate posterior probability values for individual nodes obtained from two separate analyses. The degree to which points stray from the diagonal is an indication of how much the results of two Bayesian analyses disagree about support for particular nodes. As you can see in this figure, points tend to stray more from the diagonal in the plots toward the upper right corner of this figure (where analyses are under-parameterized) relative to the lower left corner (where models are over-parameterized). Although it seems clear that partitioning can have an important impact on the tree topologies obtained from Bayesian analysis, it also seems worth noting that the worst case scenario - nodes that are strongly supported under one partitioning strategy while absent in the other partitioning strategy - is never realized in Brown & Lemmon's simulations. Has anybody obtained such a result with real data? I tend to get plots similar to Brown & Lemmon's, leading me to believe that really well-supported nodes are generally robust to alternative partitioning strategies.


Tuesday, June 3, 2008

You Know You're a Phylogenetics Geek When...

...you try to model the human mind in the same manner that you model trait evolution. Last week I sat on my second thesis defense committee. The student who was defending did an excellent job; in fact, my biggest criticism was that he was being overly ambitious in attempting distinct empirical and theoretical studies that could have been better integrated. The goals of his empirical study were very well-defined and goal oriented, but, by his own admission, his theoretical interests were evolving in a manner analogous to a random walk. My spur of the moment suggestion was that he switch the model underlying his theoretical meandering from something like Brownian motion to something that more closely resembles an Ornstein-Uhlenbeck process constrained in some non-deterministic manner to the same general area occupied by his empirical work. Bask in the glow of my dorkness.

Stop Using the Mantel Test

Lots of folks have been using the Mantel test to ask questions about phylogenetic signal and character correlation. Luke Harmon (U. of Idaho) and I just ran a bunch of simulations to see how these tests performed relative to alternatives (Blomberg et al.'s K statistic for tests of phylogenetic signal and independent contrasts for tests of character correlation). The results were not encouraging and strongly suggest that the Mantel test should only be used with data that can only be expressed as pairwise distances (e.g., geographic distances among poplutions). The Mantel test was designed specifically to deal with pairwise distances because this type of data cannot be analyzed using standard statistical methods due to a lack of independence. If individual measures (which can be analyzed using standard statistical methods) are converted to pairwise distances the resulting Mantel tests suffer from low power and, in the case of partial Mantel tests applied to tests of character correlation, elevated type-I error. I'm not really sure why people have been converting their data to pairwise distances and conducting Mantel tests, but this practice should end. I'm reluctant to write more here because we've submitted our results as a note to Evolution and don't want to give than an excuse not to publish. Write to me or Luke if you want more details.