Most of the questions below are from me (LH) but a couple come from Dan Rabosky (DR). Many thanks to Joe for participating.

JF: The availability of genome-scale information is certainly one. The arrival of a generation of young researchers who are comfortable with statistical and computational approaches is another. But the most important development is reflected in recent work on coalescent trees of gene copies within trees of species. What this does is tie together between-species molecular evolution and within-species population genetics. Those two lines of work have been developing almost independently since the 1960s. But now, with population samples of sequences at multiple loci in multiple related species, they are coming back together. This is not another Modern Synthesis, but it is a major event that needs a name. How about the "Family Reunion"? Long-estranged relatives who have not been in touch are getting together.

LH: Take us back to the beginnings, back when you were working on phylogenetic and comparative methods for your PhD thesis. Where did you derive your inspiration? Did you anticipate the impact that this work would have on the

field?

JF: I did not anticipate it at all. My original thesis project with Dick Lewontin was a rather grandiose theoretical population genetics macroevolution model -- my idea, not his. It didn't work out and I didn't have any useful results. Meanwhile Lynn Throckmorton and Jack Hubby, whose labs were nearby, needed someone to write a clustering program for protein electrophoresis band data that they had in multiple Drosophila species. I volunteered and was

fascinated by the algorithms. I went on to write parsimony programs for the Camin-Sokal, Dollo, and polymorphism parsimony criteria, and then to work on how to infer trees by likelihood using Anthony Edwards and Luca Cavalli-Sforza's brownian motion approximation to gene frequency drift. Dick finally suggested that I write this up for my thesis, which I did in 1967 (the degree was officially 1968). Through the 1970s I maintained a sideline of work on trees while mostly working in theoretical population genetics. It was really not until about 1978 that I began to see that this was becoming more important, and that it fit in with my interest in evolution beyond the species boundary. So I shifted my work toward trees and dropped out of theoretical population genetics.

DR: A lot of what we do in comparative methods is based on Brownian motion, or models for which BM is a special case (eg OU). As you (Felsenstein) have written, "Brownian motion is a poor model, and so is Ornstein-Uhlenbeck, but just as democracy is the worst method of organizing a society 'except for all the others', so these two models are all we've really got that is tractable. Critics will be admitted to the event, but only if they carry with them another tractable model."

And for discrete traits, we use Markovian models that assume (generally) homogeneous rates through time and among lineages. Undoubtedly, the math for this could get out of hand, but at some point I think we'll have to do something to explore (among other things) more realistic constraint surfaces etc.

Given this, what do you view as "the frontier" for models of continuous and discrete character evolution? New mathematics? Approximate Bayesian approaches that rely on simulation to deal with analytically intractable scenarios?

JF: Hard to see what. I think one framework will be models in which a population "chases" an adaptive peak which is moving. But we need to have some model for how the peak moves, and aside from having a mechanistic and ecological model of the function of the character this is not forthcoming. Nor is it easy to see how adaptive peaks in sister species become different from each other. We're also going to find that the amount of information available to tell different schemes of selection pressure apart will be small. We are going to have to be able to characterize what we can and can't know given the data. Just adding new mathematical tools or lots of simulation will not resolve these dilemmas.

DR: What do you think about the unification of modern (neontological) comparative biology with paleontology? There seems to be a lot of room for progress in this area. Do you have any suggestions for future directions?

JF: Oh thank you thank you thank you for giving me an opportunity to mount the soapbox and hold forth on one of my favorite topics. I've been working on this. See my paper in 2002:

Felsenstein, J. 2002. Quantitative characters, phylogenies, and morphometrics. pp. 27-44 inMorphology, Shape, and Phylogenetics,edited by N. MacLeod. Systematics Association Special Volume Series 64. Taylor and Francis, London.

and watch my Julian Huxley Lecture to the Systematics Association in London in 2008 which is available as a video also with a PDF of my slides.

Basically we can infer the tree of present-day species from molecular data, and then use it for morphological characters (or other measurable continuous or discrete characters) with a Brownian or OU model, to infer phylogenetic covariances of changes of characters. Then we can use these together with the fossil morphology to help place the fossils. (One could also use all this together in a giant likelihood or Bayesian inference but the gain in doing so will be very small as the morphology will add little to the inference of the tree, I think). One can also use bootstrap samples of trees in this, or samples from Bayesian posteriors.

There is lots to be done here and I am rushing to do it, and working with Fred Bookstein on the morphometric angles to this too. I wonder whether statistical frameworks such as this, together with within species quantitative treatment, will not be important in untangling the paleoanthropological mess caused by nonquantitative approaches to hominoid fossils.

LH: What do you think about the current trend in phylogenetics (and, lately, comparative biology) towards Bayesian approaches?

JF: I am a curmudgeon on this, in that Bayesian approaches do not feel right to me. So I have been resisting them. Bayesians were unhappy with the treatment of Bayesian Inference in my book, in that I did not give them four chapters, the last of which ended by declaring victory. I think we're all Bayesians when we come to cross the street, balancing evidence of approaching cars against our priors. But that's where one of the criticisms of Bayesianism comes in -- do we all have the same priors? Is there necessarily a single prior that you can use that will be broadly acceptable to your readership? If not, then maybe the reader of the paper should instead be given the likelihood curve so they can apply their own prior to it. For phylogenies, priors giving equal probability to all topologies (or to all labeled histories) would be noncontroversial. But the part of the prior that puts distributions on branch lengths could be wildly controversial. There is also the issue of whether some things, such as whether the sun will rise tomorrow morning, really should have a prior.

People should be Bayesians if that fits with their philosophy of doing science. But not just because a Bayesian program happens to run faster than a non-Bayesian one. They should also realize that we will continue to have both Bayesians and non-Bayesians. Biologists sometimes think that this controversy emerged in their field and will be settled there -- that one more really good argument and everyone will become a Bayesian. They might not be aware that Bayesian arguments have been around since 1764. There is no new decisive argument that's going to arise in our field.

The issue to contemplate is the priors, not the details of MCMC techniques. We have not yet seen a case where an important conclusion depends strongly on what prior you assume. Perhaps we never will, but if a case like that arises, and causes trouble for Bayesian approaches, people should not be too surprised.

LH: Your work has inspired a generation of comparative biologists. Any

advice for those of us just starting out on our careers?

JF: I have too many opinions on that for this forum. I guess I would urge people to take a long view and to realize that it takes time for methods to be developed, published and used, and to prepare themselves for the new forms of data that are coming. When I submitted my 1985 comparative methods paper, the referees were dubious about it because it required phylogenies, whereas they felt that only classifications were going to be available! A year or two earlier and it might not have been accepted for publication. I would also urge people to become familiar not only with phylogeny methods and statistical techniques, but also with the theoretical side of evolutionary biology. We're entering a period when there is going to be a merger (or Reunion) of between-species phylogenetic inference and within-species population genetics. I'm worried that we are graduating too many people who know what Subtree Pruning and Regrafting is, but who have no idea what Wahlund's Law is, or how mutational load arguments work. Theoretical population genetics is in danger of becoming a lost art, just when it is most needed. Comparative biologists should learn it -- and teach it.

## 17 comments:

I think the recent migration back to maximum likelihood that we've seen with the release of programs like Garli and RAxML is evidence that most systematic biologists who use model-based methods are not philosophically Bayesian, they just use BI because (at least for the last eight years) it's been faster than the alternatives.

It may not mean that they are philosophically in favor of ML either ...

Nice interview, thanks for posting.

Exactly. I think most practicing systematists are much more concerned with empirical issues than with philosophical ones.

I think the main raeson we see so many Bayesian approaches featured in the journals is the fact that one can not get a mere parsimony analysis alone published anymore (or even parsimony + ML). Well, maybe in Cladistics, but no where else.

I'm surprised to hear that. There was a time about 1990 when journals stopped taking papers that just made a "point estimate" of the best tree, without some indication of the uncertainty of the estimate. Is that what you are seeing? If so, I'm not alarmed. Or are the journals insisting that you must be a Bayesian?

Sorry, above comment was realy by me,

but my browser window was acting up.

I'm not certain that most people really understand the difference between frequentist and Bayesian methods of statistical inference. There might be something to the idea, expressed in the above comments, that biologists simply use the method that gives them an answer for the problem at hand in a reasonable time, regardless of the philosophical issues.

I have been quoted as saying that parsimony is based on a bunch of "philosophical mumbo-jumbo" (a misquote I will stand by). That said, I do think philosophy plays a role in science, and that when people estimate phylogeny, they should at least consider the philosophical issues. I think Bayesian analysis describes how an ideal scientist (or rational person) should act. I realize, however, that one cannot necessarily mimic this rational behavior in any particular analysis. At best, Bayesian analyses can only approximate this ideal behavior.

My own attraction to Bayesian analysis started from a more practical perspective. I used to do lots and lots of maximum likelihood estimation in the 1990s. As the models I was interested in became more complicated, the estimation procedure started grinding to a halt. Initially, at least, I turned to Bayesian MCMC analysis as a practical way to evaluate interesting and parameter-rich models. Being a flexible person, I was able to adjust my philosophy to these new analyses :-)

Today, the most exciting and important analyses in the field use a Bayesian MCMC methodology. Just look at the excellent work by Nicolas Rodrigue, for example, in which he incorporates information about the free energy of protein secondary structure and the codon structure in a framework with arbitrary non-independence of substitutions (a model with 4^L possible states, where L is the sequence length). This is really nice work which, practically at least, would be difficult to achieve in any other framework. (He even uses a DPP model to describe variation in nonsynonymous rates of substitution across a sequence. Cool.) Similarly, there is a lot of activity in model based phylogenetic analyses that treat the alignment as a random variable. This type of work has also used a Bayesian MCMC framework.

Like many others, I thought that the genetic algorithms implemented in Garli and RaxML might be the key to the reliable analysis of large phylogenies. I am not so certain any more, after having tried the methods out on large data sets. (I am most wed to the use of the likelihood function, and only secondarily to Bayesian analysis, so I was willing to try the methods out.) I have run into the types of problems that many people run into with MCMC or optimization algorithms. On the other hand, I think there is some real potential for using MCMC to analyze large data sets, such as biasing the proposal mechanism towards trees with good parsimony or likelihood scores (and correcting for that bias in the Hastings ratio). Interestingly, there are all sorts of good ideas that have come from the parsimony people--who are quite good at finding optimal trees for big phylogenetic problems--that we can exploit in constructing good MCMC algorithms.

As an aside, Joe's Ph.D. dissertation came up in the interview. His dissertation is 20 or 30 years ahead of its time. Interestingly, the work in his dissertation describes the phylogeny problem in a Bayesian context. He describes posterior probabilities--or at least, the probability of a tree conditional on the observations. The dissertation is also the first place I have seen a credible set of trees described. Even though Joe doesn't agree with Bayesian methods, he should get credit for being the first to coherently describing the method as applied to the phylogeny problem. He must have changed his mind since.

The people I know who were disappointed with Joe's treatment of Bayesian analysis in his book were not "unhappy with the treatment of Bayesian Inference in [his] book, [because he] did not give them four chapters, the last of which ended by declaring victory". Nobody I know is that simple-minded. They were unhappy with the treatment of Bayesian methods because they feel he did a poor job of describing the strengths as well as the weaknesses of the method. For that book, I skip the Bayesian chapter--pretending it doesn't exist--and read and reread the chapters on the history of the field and the coalescence.

If we have partly criticised parsimony because some cladists have advocated its use on a philosophical basis, why should we be concerned about philosophical issues now? Is it not that a blatant double-standard?

Methods should be preferred in terms of empirical performance. However, are parsimony and ML, as implemented in TNT and RAxML respectively, really one step ahead of BI in analysing large/complex data sets in reasonable times?

Is there no middle ground between philosophical jumbo-muboism and naked operationalism? Is it unreasonable to judge a method both on its philosophical coherence and the reliability of its performance?

I wouldn't count parsimony out. It is, after all, consistent over a reasonable region of parameter space. And speed isn't a negligible criterion. You just have to ask yourself whether you are in the right region.

And my experience with large data sets (though what I really should say is "a large data set") is that Garli and RAxML both have performance superior to MrBayes. They converge in a reasonable time, they converge on similar trees between runs and between programs, and those trees are not obviously bizarre (to the extent that I have prior expectations of the topologies). Caveats: I haven't exhaustively explored MCMC paramaters. Advice on good parameter choices would in fact be appreciated.

And I agree that almost none of the choice of methods among working systematists is due to philosophical or theoretical preferences, outside the Hennig Society, but has to do with perceived empirical and practical considerations. "Does it work?" is the main question.

Methods should be preferred in terms of empirical performance. However, are parsimony and ML, as implemented in TNT and RAxML respectively, really one step ahead of BI in analysing large/complex data sets in reasonable times?The answer is yes!As far I know the largest data set sucesfully run with BI was Zilla (500 rbcL [Ann, Miss. Bot. G. 80: 528], in the paper about the "parsimony" model [Syst. Biol. 57: 406]), Compare it with the 73060 terms Behemoth of Goloboff et al [Cladistics in press, doi: 10.1111/j.1096-0031.2009.00255.x] using TNT.

For the record, the large matrix with RaxML is of 13000 terms [BMC Evol. Biol. 9: 37].

Yes, but are the people who are arguing for superior performance of RAxML and GARLI comparing apples to apples? A full Bayesian run gets clade posterior probabilities, so what it needs to be compared with is a full bootstrap analysis.

If you are just using MrBayes to search tree space for a point estimate then is that overkill?

I have an admission that is vaguely related to the present post as well as the previous discussion led by Brian.

I have previously committed philosophical hybridization crimes such as swapping only on the credible set of trees from BI in order to heuristically check whether (or increase the probability that) the tree space search saved from my BI runs actually found the ML tree for a particular parameter set, or quickly see if a misspecified parameter set--significantly deviating from the posterior estimates--would yield significantly different point estimates of topology. For the latter, I could just check the topology of the ML tree(s) against the credible set of previous analyses.

Does anyone wish to wag a philosophical finger of shame? Can my sins be forgiven?

I hope the hopelessly long first sentence in the previous comment will be vaguely comprehensible.

I can rephrase it, if someone cares for a clarification.

At least one of the people asserting the advantages of Garli/RAxML over MrBayes is comparing Bayesian runs, with node posteriors, to bootstraps.

I think the primary of developing a a coherent philosophy of inference should not be to formulate a set of restrictions (dogma), but to allow better interpretation of the relationship between data and hypotheses. I admit that I find myself often guilty of the former.

Post a Comment