## Saturday, October 18, 2008

### R Tip: Labeling Trees w/ Posterior Probability and Bootstrap Support

#Fist we need to open some necessary libraries
library(ape)
library(geiger)
#The getAllSubTrees function below is a necessary subfunction that atomizes a tree into each individual subclade and was provided compliments of Luke Harmon.
getAllSubtrees<-function(phy, minSize=2) {
res<-list()
count=1
ntip<-length(phy\$tip.label)
for(i in 1:phy\$Nnode) {
l<-node.leaves(phy, ntip+i)
bt<-match(phy\$tip.label, l)
if(sum(is.na(bt))==0) {
st<-phy
} else st<-drop.tip(phy, phy\$tip.label[is.na(bt)])
if(length(st\$tip.label)>=minSize) {
res[[count]]<-st
count<-count+1
}}res}

#The plotBayesBoot function below plots both posterior probability and bootstrap values on each node of the consensus tree obtained from your Bayesian analysis. Bootstrap values will appear in bold text immediately below and to the left of the node they support, whereas Bayesian posterior probabilies will appear in regular face above and to the left of the node.

plotBayesBoot <- function(bayesTree,bootTree) {
getAllSubtrees(bayesTree)->bayesSub
getAllSubtrees(bootTree)->bootSub
bootList<-matrix("<50",nnode(bayestree),1)
#The commands below compare all the subclades in the Bayes tree to all the subclades in the bootstrap tree, and vice versa, and identifies all those clades that are identical.
for(i in 1:Nnode(bayesTree)) {
for(j in 1:Nnode(bootTree)) {
match(bayesSub[[i]]\$tip.label[order(bayesSub[[i]]\$tip.label)], bootSub[[j]]\$tip.label[order(bootSub[[j]]\$tip.label)])->shared
match(bootSub[[j]]\$tip.label[order(bootSub[[j]]\$tip.label)], bayesSub[[i]]\$tip.label[order(bayesSub[[i]]\$tip.label)])->shared2
if(sum(is.na(c(shared,shared2)))==0) {
bootTree\$node.label[j]->bootList[i]
}}}
plot(bayesTree, cex=1, lwd=0.5) #Plots your Bayesian consensus tree
nodelabels(bayesTree\$node.label, adj=c(1.2, -0.3), frame="n", cex=1, font=1) #Adds posterior probability values to the tree. Change the 'cex' value to make the fond smaller or larger. A value of 1 will give you a readable result in the R quartz window, but a value closer to 0.25 might be better for publication)
}

read.nexus("yourBayesTree.con")->bayesTree #Reads in the .con file that results from analyses in MrBayes.
bayesTree[[1]]->bayesTree #Extracts one of the two trees in the .con file.
read.nexus("yourBootTree.nex")->bootTree #Reads in the consensus tree from a bootstrap analysis in PAUP.
plotBayesBoot(bayesTree, bootTree)

Todd Jackman said...

Thanks, Rich - this (you giving detailed instructions) is a great way to slowly learn R without having to read the giant R book much.
Can you rotate branches or ladderize?
This is important for showing multiple similar trees or for getting the trees to match geography.
Also, if the Bayes consensus has polytomies, I would prefer showing the Bayes tree with best ML score, but that can be done easily.

Luke J. Harmon said...

There is an r function in ape, ladderize(), that will ladderize a tree for plotting. If you run your tree through that before running Rich's script, I think it will do what you want.

Simon said...

Have you tried FigTree? It's quite clever as it'll let you plot anything stored as a comment in the Newick description on the tree nodes/edges. So, you give it something like this:

(taxa1[bootstrap=100&pp=0.99],taxa2[bootstrap=85&pp=0.88])...etc

..thenyou can choose what/how to label the nodes. BEAST automatically stuffs every single parameter into the tree description for FigTree to use, but it's fairly trivial to reverse engineer other software to do the same.

--Simon

Jeet Sukumaran said...

Hi guys,

I wrote SumTrees specifically for this purpose.

http://jeetworks.org/programs/sumtrees

Takes one or more tree files as input, and a target tree file, and plots split support based on the former onto to the latter.

It also has some other options such as skipping a burn-in, composing a consensus tree if a target tree is not given, etc.

Glor said...

Jeet - Thanks for pointing out your program. I have yet to give it a try, but it sounds very useful. It will be nice to able to plot from the original data rather than from the consensus trees, as I've done with the R script. It sounds like it will also make it easier to do what Todd has suggested and plot support values on the tree with the highest ML score.

Jeet Sukumaran said...

Rich:

Yep, in fact the original motivation was to do exactly what Todd requested. But then I later decided that I sometimes might be interested in summing clade posterior probabilities over multiple tree files as well, but *not* mapping it onto an ML tree (i.e., I wanted clade support but wanted to integrate out topology). So I added the burn-in and consensus tree option. When I get around to the next version release, I'd like to add an option to summarize the posterior probabilities of node ages (=tip-to-node distance) as opposed to just edge lengths as it does now.

By the way, great site---it's definitely one of my regular/daily ports-of-call now!

-- jeet

Glor said...

Although I have yet to try this approach, Casey Dunn suggested that Phyutility - the application he wrote with Stephen Smith - will also perform the type of analyses implemented by Jeet's SumTrees program

Nate U said...

Hi Rich-- does this function still work for you, or have you since updated it? I just tried on R v3.0.1 and am getting a series of errors. Seems like a useful tool so I'd like to give it a try!
Thanks

Carol VERHEECKE said...

use Mega6 !