The BEST website provides example files for analyses when single alleles are sampled for each species, as well as the sampling of multiple alleles from each species. Both analyses assume that species are reciprocally monophyletic. Given that BEST is a modification of MrBayes, the data formats are very similar except that BEST includes priors for theta and mu. Also, the tree topologies, branch lengths, and mu are unlinked across the sampled loci. If a haploid locus (mitochondrial or chloroplast DNA) is sampled, BEST allows the user to define the ploidy of the locus (default setting is diploid).
If a multiple locus dataset is run in BEST, a sham file is needed to summarize the trees after the burnin (the familiar "sumt" command from MrBayes). Liang Liu has posted an example of this type of file on the BEST website.
The trees are summarized using a burnin value that discards all trees and parameter values sampled prior to convergence. As in MrBayes, summarizing the trees produces a consensus tree file, where the consensus percentages for clades are interpreted as the Bayesian posterior probability. Progress of the BEST run and assessment of convergence can be monitored using the computer program Tracer.
My laboratory group has been experimenting with BEST for the past few months, and we are generating some interesting and exciting results. The prior on theta appears to be the one issue/nuisance that we have run across in our explorations using BEST. A fairly wide prior is given in the example files. We are beginning to run BEST with more narrow, and realistic, priors for theta. So far the results are promising.
Overall, I have found BEST straightforward to implement with my multilocus phylogenetic data. Familiarity with MrBayes will certainly help new users of BEST. Also, Liang Liu has been very helpful and encouraging to users, and has implemented suggestions into the example files on the BEST website. My entire lab group is excited to be exploring the frontier of phylogenetics, with the hope of that we are making the most reasonable inferences regarding species relationships that is afforded by our hard earned data.
7 comments:
I've been interested in the new dimension BEST adds to tree (branch width) since I first read these papers. I am curious to know if - despite widely ranging values depending on the prior used - theta remains proportional across analyses. Also, are you noticing any cool patterns related to this value? Like autocorrelation, fluctuations corresponding with presumed instances of dispersal or ecological shifts, etc?
Dan, not sure. You are welcome to have at the parameter files. Let me know and I can get them up on my lab server. Frankly, I think that the time to explore these questions is now, and given your interest in computational phylogenetics you can make a nice contribution.
Although this shouldn't have an impact on the content of this string, I wanted to clear up a potential case of identity confusion. My graduate student - Dan Scantlebury - posted this as "dans"; Dan Warren at UC Davis has been posting as "Dan". Both are interested in computational phylogenetics, but perhaps we need to give these guys some nicknames...
We have used BEST in my lab for some time now. If you violate any of the assumptions (and there are many...e.g., horizontal gene transfer) and certain coalescent requirements, don't be surprised that you will find little resolution and support in your trees relative to a partitioned BI or ML analyses. See the paper by Belflore et al. (2008) on Geomyidae in Sys. Bio.
Moreover, violations of these assumptions will produce harmonic mean -lnL's that are much lower than the more resolved, standard partitioned BI analyses.
If you meet the assumptions of BEST species trees obtained from joint posterior probabilities of gene trees, then you are golden!
In my previous comment, I wasn't saying that the standard BI or ML trees are actually better...they may simply have more resolution (although this may be artificial). In contrast, the lack of resolution in BEST may also be artificial.
Frank-Has your group converged on ways to assess if particular coalescent aspects of the model and how have you varied the theta prior in your runs?
Hey Tom---
Well, theta is a huge problem and I am sure the defaults are too wide. We are exploring this now. Also, it is not entirely clear at what phylogeographic/phylogenetic level will violate the assumptions of BEST. For instance, will assessing species relationships for organisms >10 mya be out of bounds for this? Edwards thinks not..right?
Post a Comment