Tuesday, March 31, 2009

Hey R Users! Time to Update ape

The ape package written by Emmanuel Paradis is the foundation for phylogenetic analyses in R. Yesterday, Paradis and his coauthors posted a new version (3.2) on the CRAN archive yesterday. There don't seem to be too many new functions, but there are some important bug fixes. One these - preventing calculation of negative state probabilities when reconstructing discrete character states - solves one of the more vexing problems I've had with the ace function. You should definitely get the update if you're doing ancestral reconstruction of discretely coded traits! Now we just need to hope the April 17th upgrade to R 2.9 goes smoothly...

10 comments:

Anonymous said...
This comment has been removed by a blog administrator.
Luke J. Harmon said...

This does make ace work better, but I would still take care with its reconstructions. ACE treats all ancestral states in the tree as free parameters, and maximizes the likelihood across all of these parameters (and the model parameters) simultaneously. This works to the extent that the likelihood optimizer finds the true peak on the likelihood surface - that is, sometimes, not always, especially with large data sets. All other programs that I know of (PAUP, Mesquite, etc.) use a well-known algorithmic solution to solve this problem, which returns the right answer no matter what the input data.

Also, the link joke is not clever anymore, anon.

Glor said...

Excellent point Luke. When are we going to see your version of ancestral reconstruction for R?

Luke J. Harmon said...

It's done for continuous characters, still working on discrete. By summer, I'll put both in GEIGER.

Luke J. Harmon said...

If anyone wants the continuous ancestral character state code, shoot me an email.

Brian R. Moore said...

I'm a big fan of the ape package, and this new release is indeed great news.

I think that R is great for quickly implementing proof-of-concept code, for computationally light tasks, and, of course, for implementations that can exploit the flexible graphical support of the R framework.

However, I've noticed a somewhat curious trend among hard-core R users, in which they seem to really want to perform all analyses within this framework (i.e., I've often heard this perspective expressed along the lines of "Wouldn't it be great if everything was written in R!").

All this enthusiasm has generated a number of wonderful R packages, but I sometimes worry if this might lead to the implementation and use of methods in R that compromise methodological rigor for the sake of computational viability in R.

Consider the case of estimating divergence times under a relaxed clock: the only method currently implemented in R (so far as I am aware) is non-parametric rate smoothing (NPRS), which, although fast, is known to over fit substitution rate variation relative to more computationally expensive (semi)parametric methods. I think we can all agree that it would be unfortunate if R users were to adopt an inferior method for the sake of convenience (i.e., limiting their choices to the 'light' methods that can viably be implemented in R).

Die haRds typically respond to this sort of concern by pointing out that it is possible to accommodate more computationally intensive methods by wrapping R around more efficient code (written e.g., C, C++, java, etc.). For example, you could imagine writing an R wrapper for the BEAST java code to allow R users access to the more general relaxed-clock methods that it implements.

I guess my question is 'What would we gain by doing this?'. That is, what's wrong with matching the tool to the job?

Glor said...

Good point Brian. Although I'd never go so far as to use NPRS over BEAST simply because the former is available in R, I'm definitely guilty of using some functions in R that may be less than state of the art. One example is the use of R's ancestral reconstruction algorithms instead of the Bayesian methods implemented by Pagel's BayesTraits program. My justification for doing this is that BayesTraits is very good at doing a few relatively narrowly defined functions, but doesn't offer the type of flexibility with analytical methods and data manipulation that is provided by the R framework. In any case, I don't think there's anything wrong with encouraging use of R and hoping that its toolbox continues to grow, while remaining mindful of the fact that it is never likely to be a substitute for learning other applications.

Liam Revell said...

I hope I'm not misinterpreting Brian's comment when I add that I think he might also mean that some functions (for example, computationally intensive likelihood estimation) might be better implemented in a compiled language than they will ever be in an interpreted language such as R.

Glor said...

I suspect Brian was making this point as well Liam. I'm certainly not expecting to use R for computationally expensive exercises like tree building anytime soon.

Dan Rabosky said...

I agree with Brian's comment in broad outline, at least with respect to anything truly demanding in terms of computation. However, I think a counterpoint to all of this is that R has some good and powerful tools for mathematical and numerical analysis that have been implemented and tested by people far more savvy about this sort of stuff than I am. So....given a choice between doing optimization with a homemade algorithm in C (or perhaps an improperly implemented hack from Numerical Recipes), I will choose R .

Having said this, it is pretty easy to misuse optimization (and other) tools however implemented, and in this sense R makes it deceptively easy to get garbage output.