Wednesday, July 30, 2008

Phylota: My Mind is Blown

In the June issue of Systematic Biology, Sanderson et al. report the availability of the PhyLoTA browser. If you haven't checked this out yet, do - it can be a real boon to phylogenetic analyses in the age of Bioinformatics. For clades in the Tree of Life, PhyLoTA identifies clusters, which are sets of sequences for the same gene across sets of taxa. For example, here are some clusters for the geckos. You can see the big hitters in there: c-mos, 12s, 16s, cytochrome b. There are also a few decent data sets for other genes as well. You can then download preliminary alignments for the clusters for your phylogenetic analyses.

If you've every tried to do this sort of thing directly in GenBank, the value of this will be immediately apparent. This is particularly useful if you, like me, suck at sequencing.

As a side note, I talked to a colleague who got harassed at the Ichs and Herps meeting for... gasp... downloading sequences from GenBank and using them without asking the author's permission! Good lord, what is the world coming to? I'm surprised to hear of such active resistance to public availability of information.

UPDATE: Further discussion of the final paragraph at <bbgm>.

6 comments:

Weisrock Lab said...

Another useful tool is the Geneious Linnaeus BLAST browser (http://www.geneious.com/default,634,linnaeus.sm). It will take a particular sequence and return all BLAST hits in a taxonomically structured tree format. Blocks of sequences can be viewed in fasta format and copied for use. It does not provide alignments (although Geneious itself will do this).

Susan Perkins said...

This is really, really cool. I did some searches for my favorite organism name (Plasmodium, of course) and it looks about what I expected. The ability to so quickly download alignments is really great. I do worry, though, that people are just going to take those trees and plunk those figures into papers...they're of course not an end-all, be-all phylogeny.

Anonymous said...

Hey.. thanks for the plug! We share the concern about using the phylogenies as is - hence the 'provisional' label and footnote about the reconstruction methods. I think of the main resource in Phylota as the cluster sets, with the alignments and trees thrown in as a bonus. Now, the question is... what can we do with 80,000 alignments?

Glor said...

I sincerely hope that people will head Karen/Phylota's advice.

I've played with Phylota a bit since Sanderson introduced it to the Bodega group and wish I had more time to do so now. Definitely an amazing resource.

No big surprises in Anolis - Jason Kolbe and I still hold the sequencing crown with more than half of the available sequences coming from our pipettors:)

Jonathan Eisen said...

Welcome to the world of "pretend" openness. Researchers get credit from the funding agencies for releasing data to Genbank at some point. In fact it might be required. And then they give you grief about using it. This has been happening in genomics for quite some time, I am sorry to say.

Poletarac said...

Hm... I just looked at my favorite plant family and found that PhyLoTa has a single orthologous gene under balancing selection split into (at least) three orthologous clusters. In each cluster, the phylogeny is patently incorrect because of the lineage sorting issues. Bummer.

It sure makes the sequence collection from GenBank easy, though.