Monday, January 11, 2010

Priors and convergence in BEST

Over the past year, I’ve heard a bit of grumbling about how difficult it is to achieve convergence using the program BEST ( Bayesian Estimation of Species Trees ). I was recently using BEST to analyze a reasonable-size multilocus dataset (20 taxa, 5 loci), but things were looking grim: convergence was not happening, and the trees looked nothing like concatenated and single locus trees estimated using RAxML. Then I came across this nice paper by Adam Leache , where he demonstrates a strong effect of priors on convergence in BEST. BEST requires setting two priors that are specific to the hierarchical species tree model: a prior on species population sizes (theta; thetapr), and a prior on the relative gene mutation rates (GeneMuPr). Leache showed that by increasing the mean of thetapr, he was able to obtain much faster convergence (inset).

Inspired, I set up a series of runs with my data where I increased the mean of thetapr – up to shape=3 and scale = 0.1 [for a mean of scale / (shape – 1)]. I found that performance improved immediately and dramatically: I was achieving significantly higher log-likelihoods within 200K gens of sampling than I had previously found in tens of millions of generations. After just 30m generations – a single night’s worth of sampling – my analyses pass a battery of convergence tests, including AWTY-based sliding window analyses of the actual species tree sample. While I’ll run this out for a few more days to see what happens, the results are pretty encouraging.

As an aside, I found that the default prior on gene mutation rates (uniform on 0.5 – 1.5) is not adequate if you have substantial among-gene heterogeneity in rates. I had mtDNA mixed with nuclear loci in my analysis, and the mtDNA rates were far too high for the defaults: the estimated mtDNA rates were simply piling up on the upper bound of the prior distribution (1.5). Because the mean rate across all loci is 1, the ratio of the bounds of this distribution represent the theoretical maximum relative rate difference that you will allow to occur within your dataset. If you have K loci, the theoretical maximum value this can take is K (which, if observed, would require that you have K-1 loci with relative rates approaching zero). So, I suggest using a uniform (0, K) prior on this parameter – it is a uniform distribution, so using an overly broad range isn’t likely to have any pathological consequences for your analyses – and this seems much better than the defaults, which allows at most a 3-fold difference in mutation rates among loci (eg, upper = 1.5 divided by lower = 0.5).


Glor said...

Thanks Dan, some useful words of wisdom here.

Unknown said...

Awesome- thanks for succinct but diagnostic thoughts! Perhaps there is still hope for some of our own datasets.

Unknown said...

One thing to note (and Adam mentions this) - the value of theta that had the greatest trouble converging was v. small (0.00015) which is likely unrealistically small. Another general rule of thumb for BEST - running lots of steps seems to be quite important - we're running 2 billion steps on a dataset with 8 tips and 40 loci.

fdelsuc said...

Thanks for pointing out these very interesting observations.