Saturday, July 26, 2008

R Tip: Indicating Tree Support

I can't take any more of the tiny, unreadable posterior probability and bootstrap support values that I've been seeing on phylogenetic trees at the Ichs and Herps Meeting. Wouldn't it be easier to to put some easy-to-read symbols on the nodes instead of text in 4 point font? I understand why people haven't done this in the past -- it would have required individually replacing text with symbol at each node. Thanks to R, however, this tedious process is no longer necessary. Below are some simple instructions for doing this with posterior probability values included in the '.con' trees that are output from MrBayes using the sumt command (see further details in Paradis' book on R for phylogenetics). If you've never used R for phylogenetics before, you might also want to start at the new R-phylo Wiki.

#First, load the required library.

#Next, get your consensus tree from MrBayes into R."your_file.con") -> your_tree

#Then, simplify matters by using only the tree with PP values.
your_tree[[1]] -> your_tree
#Tell R to save the resulting tree file in PDF format.
#Generate the vector required to store values for background colors for symbols.
p <- character(length(your_tree$node.label))
#The following three lines define your labeling scheme. p[your_tree$node.label
>= 0.95] <- "black"
< 0.95 & your_tree$node.label >= 0.75] <- "gray"
< 0.75] <- "white"
#Almost done, you're ready to plot your tree

#Now label your tree:'pch' tells R to use filled circles, 'cex' defines the size of the circles, and 'bg' tells it the name of the vector including the fill colors.
nodelabels(pch=21, cex = .75, bg = p)
#Finally, turn off the PDF writing


Anonymous said...

Any size is better than none, and in my own package you get none in the graphic drawings of the trees. It is very hard to figure out how to put numbers on tiny little branches. Even symbols are a stop-gap measure and with enough, and small enough, branches that will fail too.

As an aside, you would not believe how many times I have gotten the complaint (for the tree-drawing programs Drawtree and Drawgram in PHYLIP) that the user is upset because they had the program draw their 800-tip tree on a single page and the species names were too small to read! I am mystified as to what else they expected to happen in that case. (I have options to draw the tree over multiple pages, but they don't even think of using that).

Susan Perkins said...

I always say - the bane of systematics is that the more data you collect, the more illegible your slide (or figure) is. It's always important to consider what you want your audience (or viewers) to take away. I've tended to favor the dots on nodes that are well supported - it is an immediate visual of a well-resolved tree - and then zoom in on clades of interest or be sure to discuss them so the point is clear. Not every node is (equally) interesting.

sergios-orestis kolokotronis said...

As long as node support numbers are explained in a little text box or in the legend (I prefer the former), shapes and colors are the way to go. R and Ape, specifically, are great for this purpose. Displaying the overall topology and magnifying interesting subtrees is the best move, to echo Susan. Rod Page among others has been working on interactive tree viewers.

Another way is to color internode branches in FigTree. One can specify not only colors, but the range of colors corresponding to a range of node support values as well as a line width range, so that internode branches leading to highly supported clades are thinker, and so on.

Anonymous said...

Hey Rich,

thanks for the script! I think this line should be changed to >= 0.95 or none of the high support nodes get labelled!

p[your_tree$node.label <= 0.95] <- "black"

Glor said...

Thanks man. It should be fixed now. I screwed up when revising my original into html format with appropriate tags for math symbols.