We've talked a bit about the use of R for phylogenetic comparative analyses. We love it. One of the biggest complaints with this software has always been its the steep learning curve. Paradis' book takes us big step toward the elimination of this obstacle.
By introducing the R package known as ape, Paradis has done already done more than anyone to bring R to phylogeneticists. His books now makes it possible for even a novice R user to get their feet wet with a broad range of R applications related to phylogenetics and evolution. Although many of the R applications to evolutionary biology introduced by Paradis remain relatively primitive (e.g., direct analysis of sequence data, reconstruction of phylogenetic trees), the phylogenetic applications he discusses are at the bleeding edge. The flexibility R offers for graphical output of phylogenetic trees are also unrivaled.
There are really only two problems worth noting with this book, one ironic, the other tragic. The irony is that book espousing the benefits of free software (and even written using the free typesetting software LaTeX) is anything but free itself. The bloodsuckers at Springer are actually trying to extract >$50 for this slim 211 page paperback. If you're short on cheddar you might try contacting Paradis, word on the street is that he's a good dude. The tragic problem is that the book is already out of date and incorrect in places, due in part to some untimely, and seemingly unnecessary, revisions of ape's code by Paradis himself. I was tearing my hair out for a good long while before I figured out that he had changed the format of the $edge portion of ape's default tree format. Internal nodes were previously labeled with negative integers, but are not coded with positive integers.
R will change the way you do science. Paradis will help. The good will of the community will get you the rest of the way.
Saturday, May 31, 2008
In spite of numerous noble efforts (e.g., TreeView, TreeEdit), the phylogenetics community has always lacked a simple, fully-featured application for viewing trees and their associated features. Figtree comes closer to meeting this need than anything that has come before. This program focuses exclusively on displaying trees and producing "publication-ready figures" and does not actually conduct any analyses (even things as simple as generating consensus topologies). Nevertheless, it has quickly become one of the most important tools in the phylogeneticist's box. It's easy-to-use graphical user interface permits users to do everything from selectively shading branches to displaying support values or branch lengths. Trees can then be exported to the PDF format, whether it be for publication or subsequent revision in a program like Adobe Illustrator. Although some users might have been hoping for exports in JPEG format, I'm glad this isn't included. JPEG files, of course, result from conversion of high quality vector graphics to compressed, rasterized graphics that are invariable of a lower quality than the originals. Nothing has done more to contribute to the hideous pixelated images that grace the pages of your favorite journals than the use of JPEG files (or other similar formats).
Saturday, May 24, 2008
In what will surely become one of the most influential papers of 2007, Maddison and colleagues propose a new model to resolve an important and often unrecognized problem in ancestral state reconstructions and studies of key innovations.
Specifically, the effect of character states on the process that generates the observed phylogeny (speciation and extinction rates differ depending on whether the lineage is in state 0 or 1, for example) frequently made it almost impossible for previous models used in reconstruction of ancestry to infer the correct ancestral states and transition rates. The opposite is also true--the inaccurate inference of ancestry made it impossible to infer correct state-associated speciation and extinction rates associated with each character state. The new model, named BiSSE (binary state speciation and extinction), is implemented in Mesquite.
Plainly stated, if one wishes to analyze the evolution of a character with two states, each of which is associated with different speciation and extinction rates, the use of the old Mk-family of models is likely inadequate. Equal net diversification rates for alternate states are unlikely, as are equal transition rates.
And who wants to study characters that do not affect net diversification rates?
Stochastic mapping is a method for inferring the position of mutational changes or shifts in morphological or ecological traits on phylogenetic trees. It's a cool method with lots of potentially interesting applications. Although basic stochastic mapping can now be implemented in Mesquite, Jonathan Bollback's program SIMMAP permits one to expand the basic methodology a bit further (one can use SIMMAP, for example, to test character correlations). Unfortunately, this program was removed from its original web-site (the one referenced in Bollback's BMC Bioinformatics application note) and isn't readily accessible via internet searches. Fortunately, he has just provided a link to the program's new page. Check it out. Looks like a new version is on the horizon...
Supercomputers or computing clusters are now a popular solution to the computational challenges posed by increasingly large phylogenetic datasets. By using a cluster, you can speed a typical MrBayes run up by at least eight times (by running each of the eight chains required by the default MCMCMC settings on a different processor). The obvious problem with these resources, of course, is that many users don't have access to a cluster. Fortunately, this is beginning to change. One emerging resource is the CIPRES portal, which offers public access to computing resources at the San Diego Supercomputing Center. Although some have complained that this massively multi-PI, NSF-funded resource has been slow to develop, there has been tangible and important progress over the past few years. At this point, users can implement some of the most popular applications in phylogenetics (e.g., PAUP*, MrBayes, RaxML) through a web interface. In most cases, unfortunately, this interface is limiting; for example, some of the most popular options in MrBayes (e.g., parititioning) and PAUP* (e.g., multiple randomized sequence addition replicates in a heuristic search) are remain unavailable. Nevertheless, they're aware of these limitations and I've been told that improvements are on the horizon. This is an important resource and I want very much for it to succeed.
Friday, May 23, 2008
We're going to try to avoid the whole evolution versus creationism thing here. There are enough other blogs and resources covering this topic. It is fun sometime though, to see how phylogenetic uncertainty has gotten tangled up in the debate. Some of these fanatics are actually reading our papers and talking about discordance among markers! It's a good read, they cite lots of the papers from the 90s about reconstructing the early history of the tree of life. Who wants to explain the conceptual basis of gene tree/species tree conflict to somebody who thinks the Flintstones are a documentary? Love that quote from the Moonie Jonathan Well's 2000 book: "Inconsistencies among trees based on different molecules, and the bizarre trees that result from some molecular analyses, have now plunged molecular phylogeny into a crisis."
I wonder if they're going to revise in light of Dunn et al.?
Thursday, May 22, 2008
It is difficult to grasp the pace of scientific progress these days. One benchmark is how we do our work. The most common programs people use to carry out their analyses were unavailable when I started my PhD research, and many of the key methods had not been invented yet. This could be because we're in a particularly fruitful time for research, but I tend to think it's more of a sign of things to come. We need to be prepared for the fact that the next batch of scientists, 5-10 years from now, will be applying techniques that have not even been invented yet on data sets that we can hardly imagine. This perspective is expressed well by Ken Robinson. (Thanks to Larry Forney for pointing me to that video).
What does this mean? To me, it suggests that there's a serious lack in training of graduate students. In other math-intensive fields (like physics), students are required to take a variety of math courses to prepare them to deal with complex data and equations. In biology, students sometimes take these courses, but it's usually not required. I think it's a key ingredient for success in an uncertain and fast-moving future.
What classes are the most valuable? To me, these have had the most pay-off:
1. Probability (something more advanced than a basic stats course)
3. Matrix algebra
Take a math course or two, it won't kill you.
Wednesday, May 21, 2008
The new editors will stop at nothing to spice up this journal; the May number's cover features two chickens doing the nasty. If you're interested in plant mating systems and genetics you're going to love this number. If you're looking for phylogenies, the reading material is less fertile. Hedtke et al. do some fairly standard phylogenetic analyses with Garli and MrBayes in their analysis of androgenesis (presence of father-only nuclear chromosomes in offspring) in the clam genus Corbicula. They use Brown & Lemmon's program MrConverge to diagnose convergence of their Bayesian analyses, but provide the same link for this program that has been down for weeks. There's also an intriguing paper by Egan et al. on the identification of host-specific loci using a genome scan. The approach uses >400 AFLPs to identify loci under selection in populations of beetles specialized for different host trees. It seems like a reasonable, if crude, option for identifying loci under selection in natural populations when candidate gene or QTL studies are not feasible.
Monday, May 19, 2008
Sunday Book Review: Evolution: What the Fossils Say and Why it Matters by Donald Prothero (Columbia University Press, 2007)
Yes, I know it's Monday, but I was busy with graduation yesterday...
Prothero takes an impressively comprehensive approach to debunking the claims that creationists have made about the fossil record. The most useful part of the book debunks the creationist's claims that the fossil record contradicts Darwinian theory. He hits all the creationist's favorites, from the Cambrian explosion and its implications to the proposed absence of transitional forms between ungulates and whales. Prothero rarely minces words in delivering a major smack-down to ignoramuses like Duane Gish. Although the details may leave some hard core evolutionary biologists a bit unsatisfied, Prothero provides all the references to the primary literature that are needed to fill in the gaps. This will be an important reference work.
I do wish he had made a bit more of an effort to integrate the stunning new conclusions revealed by molecular phylogenetic analyses, which serve to further reinforce the validity of Darwin's theory. On a related point, I also can't stand to see so many phylogenetic trees without one iota of support. In some cases, failure to consider molecular phylogenetic studies and phylogenetic uncertainty results in presentation of potentially outdated relationships, like the repeated depiction of turtles as the outgroup to all other extant reptiles and birds (Fig. 5.4 & Fig. 11.1 [which also appears to suggest that snakes are the sister taxon to lizards]). The position of turtles remains controversial, but numerous molecular phylogenetic analyses suggest that they may be closely related to archosaurs rather than branching off at the base of the reptile lineage.
Sunday, May 18, 2008
As part of our Saturday ritual to boost our google hits I've scoured the internet (i.e., did one google image search for "lizard porn") for the finest examples of lizard porn. I don't advise anybody else to do the same: people are fucking sick.
One thing is clear from my search: Anolis carolinensis is a porn star. The internet's offerings are dominated by this species. Perhaps its self confidence has been buoyed by the recent sequencing of its complete genome?
One thing is clear from my search: Anolis carolinensis is a porn star. The internet's offerings are dominated by this species. Perhaps its self confidence has been buoyed by the recent sequencing of its complete genome?
Thursday, May 15, 2008
A few weeks ago I wrote about my frustration with learning about the program Are We There Yet (AWTY), which was designed to help diagnose convergence in Bayesian phylogenetic analyses. Now, with some help from Dan Warren (one of the packages developers), I'm starting to figure it out. It's a diamond in the rough, valuable not only for diagnosing convergence, but also for visualizing similarity among alternative tree topologies and support values (I was tipped off to the later use by McGuire et al.'s recent Sys. Bio. paper on hummingbirds). Check out the figures generated using the "Compare" feature. This figure compares posterior probability values for each node in two samples of trees (the axes range from 0-100 and represent posterior probabilities). Nodes that have the same posterior probability in both sets of trees will fall along the diagonal. Nodes whose posterior probabilities differ between the two samples will fall of this diagonal. AWTY can also identify topological incongruence: if a node is present in one tree and absent in the other it will have a 0 along one axis.
The figure on the left is from two independent analyses that used the same data and partitioning strategy. The relatively tight fit of this data to the diagonal indicates similarity of support values and topologies between these two samples. The figure on the right is from analyses run using different partitioning strategies. Although the scatter is somewhat higher, most nodes whose support differs between the samples are relatively poorly supported in both. Moreover, there are no nodes with posteriors of >20 in one sample that are absent from the other, indicating topological concordance.
Good stuff. Now I just need to figure out how to get around the restrictive web interface...
Wednesday, May 14, 2008
Ask questions of developers, but only if you know what you’re talking about. One surprisingly common impediment to progress is the reluctance of end-users to seek support, or their inability to get this support when its needed. One problem is that developers tend to get unresponsive (or pissy) when consumers of their applications are constantly asking them questions that are mundane or naive. To a degree, this is reasonable. These people have already put a ton of time into helping you, and are justified in recoiling if you seem unwilling to put the same time into helping yourself. As a matter of respect, you should read over the instructions and try some basic trouble shooting on your own before asking the developer for help directly. We don’t learn programs with someone holding our hand, we learn them primarily through experimentation, trial, and error. Having said this, of course, some developers may deserve a bit of hassling if they haven’t taken even the most basic measures to make their software accessible to the public. Moreover, its important to remember that most developers are your colleagues and are eager to communicate with informed users of their applications. They’re even hoping for you help spreading their methods, extending their application, and catching bugs.
Monday, May 12, 2008
In Monkey Trials & Gorilla Sermons, Peter Bowler offers a comprehensive history of the debate between evolution and religion. Breaking from the growing, aggressively anti-religious, sentiment of other recent treatments of the clash between science and religion (e.g., Dawkins, Dennett, Hitchens, Harris), Bowler strikes an intentionally conciliatory tone. Indeed, he begins by noting that the purpose of his book is to show that “”a rigidly polarized model of this relationship [between science and religion] benefits only those who want us to believe that no compromise is possible.” (p. 3). His effort to be objective has led to a balanced, insightful treatment that identifies subtleties in both sides of the debate.
He devotes considerable attention to diversity of Christian perspective, and its historical evolution; why do some Christians accept evolution without a fuss while other consider it a literal battle for their souls that cannot be lost? Bowler follows Ruse in dichotomizing Christians into pre- and postmellenialist factions, and suggesting that this philosophical distinction is tied to ones willingness to accept of something other than a literal interpretation of Genesis. To the premillienalists, a group that includes most modern evangelicals in the United States, accepting evolution not only rejects their belief in a literal interpretation of the bible, but also defies their core conviction that humanity is fundamentally sinful and incapable improving our situation on earth. An interesting point, and one that seems theologically important. I’m not sure I’m ready to grant this depth of understanding to most practicing evangelicals, but there must be something beyond textual misinterpretation inspiring their ignorance.
A minor gripe; there are some cases that suggest a comparative perspective has not been fully considered. On page 22, for example, it is suggested that “...Christianity is unique among religions in seeing suffering as an integral part of the relationship between the human and the divine.” What about Buddhism, whose four noble truths begin with “life means suffering”?
In addition to critical analysis of religion’s perspective, Bowler also provides insight into the perspective of the ‘Darwinians’. In his final Chapter (“Modern Debates”), for example, he considers the basis for Dawkins and Dennett’s rejection of religions as memes whose conflict will inevitably be harmful to society.
I’m sure everyone on both sides will find something to complain about here, but hopefully they won’t let this stop them from reading this contribution.
Sunday, May 11, 2008
Saturday, May 10, 2008
Isabella Rossellini made several terrific short films, now posted on the Sundance Channel, about the sexual practices and natural history of six insect species, spiders, and earthworms. Botanists are hopeful for a far more interesting and kinky plant porn series, which is perhaps forthcoming. (She fixes the cable?)
Learn new programs. If you’re going to do modern phylogenetic analyses you are going to be learning new programs. All the time. During my ten years in this field I’ve learned to use well over 50 programs. Some of these programs are not be easy to use. Some take weeks, months, or even years to master. You cannot allow this to lead you on a detour of convenience to use of inappropriate analyses. It is your responsibility as a scientist to do the best analyses possible. Don’t be lazy: when you stop learning new programs you stop doing modern phylogenetics.
Thursday, May 8, 2008
I love the figure from the Nature news piece about the platypus genome. Of course, I was delighted to see that the "Green anole lizard" was included and noting as one of the species whose genome sequencing has been completed. The funny thing is that the Dodo is included as "Missing/Required." Who has the dodo on their list of the top four bird species for genome sequencing? Let's not forget that the Dodo is just a glorified pigeon. I'm not sure what we'd learn anything from sequencing a dodo genome that we couldn't also learn from sequencing the genome of a rock dove. I suspect it might also be a bit easier to get DNA from a rock dove...
The platypus genome is out. The coolest result is that the venom produced by the male platypus is derived from some of the same gene families that were coopted for venom production in reptiles. An amazing example of independent evolution, to be sure.
I was really excited to see some phylogenies in this paper and looking forward to learning all about how they were made. After browsing the paper and skimming through the copious supplemental material, however, I have yet to find any information on how the tree in Figure 1 was obtained (see image). A program called NJTree (since renamed TreeBest) is mentioned in the supplemental material, but it's not clear which analyses this program was used for, or which of its algorithms were used (it seems capable of building trees via both neighbor-joining [a distance-based method that is decades old and riddled with problems] and 'extended' maximum likelihood [a method that I've never run across]). Since we can't learn anything from methods that don't exist, we're left to ponder just one question: How can a paper with ~100 authors not include a single meaningful sentence about the methods used to produce at least two of its five figures?
Wednesday, May 7, 2008
Recently, I participated in a "hackathon" sponsored by NEScent, the National Evolution Synthesis Center. The goal of this weeklong meeting was to gather together programmers who are writing comparative algorithms in the r software language.
If you haven't discovered r yet, it's a free and very powerful platform for carrying out all sorts of statistical analyses. r syntax is a little difficult to master, but it is really worth learning. You can download r here, and find free documentation to learn the language here.
The great thing about r is that people have written all sorts of useful packages to do various things. Most of the phylogenetic comparative approaches available in r are based on Emmanuel Paradis' ape package. My package, geiger, for example, is dependent on the framework provided by ape.
The point of this post, though, is to point out that the hackathon produced a product of great usefulness to the community: the R-phylo wiki. This wiki has detailed instructions for carrying out all sorts of comparative analyses, from independent contrasts to disparity-through-time, in r. Enjoy!
Don’t be intimidated by command line only applications. We all love programs with beautiful graphical user interfaces (GUIs). We should all use a few moments of the time these interfaces have save us to thank the developers who have used many hours of their own time to develop them. For brand-new and highly specialized analyses, developers are justified in making their methods available only in the form of somewhat-more-difficult-to-use text-based analyses. If you are going to do phylogenetics right you must learn to use these applications. Early in this process, you will be doing yourself a favor if learn basic UNIX syntax and file architecture.
Tuesday, May 6, 2008
Get a text editor and use it. Ironically, the most advanced programs often require the simplest input: ASCII text files. The best way to avoid problems with these types of files is to never use an advanced word processor like Microsoft Office or Mac OSX’s Pages. Don’t even use the simpler text editors that were included in the base install of your operating system (e.g., TextEdit in Mac OSX or Notepad in Windows). Go straight to the your friend the internet and download either TextWrangler (Mac OSX) or TextPad (Windows) (if you're on a UNIX platform you don't need my help!). In addition to sparing you the unbelievable amount of confusion that can result from hidden formatting or invisible extensions, these programs are wonderfully easy to use and full of useful features (It nearly blew my mind to learn that I could use the option key to select columns of text in TextWranger). Use these programs to create, edit, revise, and review all of the files that will be input to or output from text-based applications.
Friday, May 2, 2008
Selecting an appropriate burn-in point is critical for Bayesian analyses. Many people continue to do this arbitrarily, by excluding, for example, the first 10% of trees sampled. This seems silly and wrong. Others visualize their posterior scores in programs like Tracer and or Microsoft EXCEL and eliminate the set of initial trees possessing likelihood scores that are obvious outliers. This is also a bit arbitrary. An even more sophisticated approach involves visualization of split posteriors using the on-line application AWTY (Are We There Yet). Seems like a cool idea, leading me to wonder why it isn't more widely used. Probably because its more or less impossible to figure out what its doing!
Thursday, May 1, 2008
A new paper from Ally Phillimore and Trevor Price, Density-Dependent Cladogenesis in Birds, has just been published in PLoS Biology. In this important paper, Phillimore and Price argue that phylogenetic trees of birds show a general pattern of diversification rates that slow through time.
This pattern is revealed by the branches in these trees. If you think about each branch on the tree as an interval between two speciation events, then the lengths of those branches should be related to diversification rates; if diversification rates are high, speciation events will be closer together, and the branches in the trees will be shorter. The authors suggest that one explanation for this pattern is density-dependent cladogenesis. That is, the rate of diversification slows as species accumulate.
I like this paper because this particular tree shape, with short branches near the root, is one that I commonly observe in my own phylogenetic trees. If these results are general, we might be able to confirm something that evolutionary biologists have suspected: species interactions affect diversification rates. We postulated in 2003 (paper here) that such early bursts in diversification rate might be associated with bursts in the rate of morphological evolution, another characteristic pattern of adaptive radiation. This hypothesis has not yet been evaluated across a large enough number of trees to form any conclusions.
This paper reminds me of another general pattern in macroevolution: phylogenetic trees are more imbalanced than one would expect based on most null models (see Mooers and Heard 1997 & Blum and Francois 2006). Such regular patterns at macroevolutionary scales are hard to come by. When we find general patterns at this level, we have learned something deep about the process of evolution. Interestingly, there seems to be some relationship between imbalance and slowdowns in the Phillimore analysis; more imbalanced trees tend to show stronger slowdowns. Perhaps this is because the density dependence sometimes has a lineage-specific component.
Also, don't miss the nifty set of simulations in this paper showing that there is a slight bias in the test the authors are using, but that the pattern in the data is too strong to be explained by that bias.