Tuesday, June 17, 2008
Blast Tree View
While putzing around NCBI, I just noticed that Blast tools now include something called Blast Tree View, which "features new distance measures, tree downloading, re-rooting, simplification and sequence grouping." This could either be a nice way to quickly visually assess quirks in your search results or the beginning of an avalanche of disastrously unprofessional figures in prominent journals. I won't venture to guess which way it will go.
Subscribe to:
Post Comments (Atom)
6 comments:
NJ/FastME with MaxSeqDiff has been there for a while, but the protein matrices (Kimura and Grishin) are new, so is the Newick tree dnld. I am not familiar with the Grishin protein matrices; Grishin has a bunch of papers on protein structural alignments. The NCBI folks should add a disclaimer in big fonts that this is just a guide tree based on pairwise BLAST alignments NOT to be used as the definitive phylogenetic reconstruction.
Even with the disclaimer I think this one is going to go the direction that Poletarac feared.
These posts remind me of a problem I'm facing with increasing frequency: trying to help people build trees who have little time or interest to do it properly. I've gotten into the habit of asking anybody who asks for help with a tree: "How much time have you budgeted for this project?" The usual answer from my colleagues in molecular and cellular biology is: "An hour or so." Of course that's barely enough time to download MrBayes and run one of the sample files. As a result, most of them use simple distance methods available in software packages they're already familiar with. Sadly, I think many phylogenies made in this manner make it to publication. What to do?
It used to be that simple, "fast" tree building algorithms had a place - there simply wasn't enough computing power to do rigorous search methods. However, that is really not the case anymore, especially at the scale of tens of species. I'm a little surprised that people who are presumably mathematically sophisticated (bioinformatics, after all) would be so simplistic with their phylogenetics.
I've got a pretty slick pipeline with a web interface just about set up for when molecular biologists ask me for help. It's also useful for my lab to run a quick tree, since we need to know something about some gene tree or another all the time. Here is what it does:
1. User pastes protein of interest into web form. They also select how many nearest blast hits they want to retain for their gene tree. Then they hit the "start" button.
2. First, their protein is blasted against the "uniref" database. This database clusters proteins by similarity, and retains only a single protein as an examplar for the cluster. The top X blast hits are retained, as specified in the web form.
3. The pipeline uses muscle to align the bait protein and the retained blast hits.
4. Pipeline changes file format, and sticks the aligned sequences into RaxML. Default for the pipeline is a WAG gamma model.
5. The ree is written to a url, and a link is generated to phylowidget, which displays the tree very nicely, and can print it.
6. I am working on putting links into phylowidget so that each OTU is hyperlinked back over to the uniref database.
It's not quite ready for prime time, but very close.
So, a molecular biologist who needs a gene tree for their gene can paste in their sequence, and get a pretty state-of-the-art ML tree, at the touch of a button. (The pipeline takes ~15 minutes to execute)...
I am so tired of seeing papers in developmental bio journals and mol biol or biochem journals where the tree is "THE Clustalw" tree. This is really common and its just the tree spit out by clustal pre-alignment....
I am trying to decide if I want to make this website public. But I worry that it would really eat away all my CPU power. So, I'll probably just give the password to people who ask me for it....
Todd - This would definitely be an improvement over the existing methodologies. Perhaps you could get somebody like CIPRES to host the pipeline?
I think, more importantly though, that people need to understand that phylogenetic analyses represent an entirely distinct field of study and cannot be implemented correctly by a naive researcher. I've tried to use an analogy for my colleagues in cell biology. I say that phylogenetic analyses run in fifteen minutes are like the results a cell biologist might have obtained from a 19th century microscope. You're going to see some interesting things, but you're going to miss lots of other stuff, and you're probably going to get some stuff wrong too.
Post a Comment