Tuesday, July 7, 2009

Large Tree Extraction, Viewing, and Printing

Extracting trees from various idiosyncratically formatted databases is a pain, as is the task of printing very large trees for visual inspection of showing off. Many software packages are competent or good at one or both sets of tasks--I use some combination of APE package for R and FigTree, which work well for most of my needs, but not all. There are two new tools that help partly solve a couple of vaguely related problems.

Archaeopteryx, written by Christian Zmasek, is a Java-based application potentially useful in broad comparative analyses for extraction of trees from various source formats. With open source Java and Ruby libraries, it reads and displays trees in common formats (e.g. newick and nexus) as well as NCBI Taxonomy, TOL, and PhyloXML. For a nice example, take a look at the amphibian tree, then go to 'View as Text' and copy the newick formatted tree. Why? If you were to wish to print this or another large tree, you'd be out a nice chunk of time, and experience moderate hassle. No longer.

Tred, Rick Ree's web-based tree printing tool, can import, manipulate, and print very large trees. The most exciting feature for me is the duplication of sections near the margins for easy pasting of multi-page trees! The amphibian tree can be easily printed over three pages, taped together at convenient overlaps, and posted above your desk.

I can barely contain the urge to outdo the Hillis and Bull Lab tree, pictured above, by using the above tools to print out and unfurl a gigantic tree of life from Sears Tower. (Nevermind the uncertainty or accuracy.) A cheap way into the Guinness Book of World Records?

[Thanks to MudPuppy for pointing out Archaeopteryx!]

22 comments:

Glor said...

Sweet, I'll look forward to experimenting with Tred's multi-page printing. This has proven to be the one frustrating limitation of FigTree (am I missing something?). Also worth mentioning in this context is Sanderson's program paloverde.

Paul Gardner said...

That all sounds very interesting. I use Figtree also. Haven't tried APE yet -- what advantages does it have? Have you seen Dendroscope by Daniel Huson? It has some very nice features.

Joe Felsenstein said...

Thanks for pointing out these programs. I just want to note one "problem" that users of my programs often point out. They email me and complain that they tried to print out their tree of 500 tips on one page -- and they are very upset that the names are then too small to read! It seems to them that this is our fault.

Paul Gardner said...

I could definitely see this happening. Dendroscope and Archaeopteryx solve this by selecting representative tip-names/species for a closely-knit cluster. The only problem I see with this approach is that often you would have a preferred name you would like displayed. Eg. in a tree containing drosophilids you would generally prefer this group to to represented by D. melanogaster rather than some other random drosophilid. I haven't seen anything that lets you set this (yet).

Dana said...

I find Figtree to be *much* more stable under an OSX java environment. I have continuous lockups and crashes on windows platforms, and it can take forever to open a file with lots of trees.

fdelsuc said...

In my opinion, one of the nicest and unique feature of Dendroscope is its capacity to automatically scan a folder of pictures based on filenames to display them at the tips of trees.

Poletarac said...

Hi Joe, that's pretty silly. As a user (rather than a developer) I'm embarrassed that things like that actually happen. That's a couple of steps below not RTFM.

Paul, if you look at the Archaeopteryx page, you'll find a list of alternatives near the bottom, including Dendroscope. It looks like it does a lot of stuff. I like Rick's program for printing trees because it's so simple and unlikely to incite complaints. It fits well with this one tool for one job philosophy. Archaeopteryx is potentially useful for developers because one needn't re-invent the ways to parse various databases to extract tree information. Zmasek's libraries can be used to do so, instead.

You can find out more about APE here, and do things like automate bootstrap support placement.

Poletarac said...

In reference to Ree's Tred, I meant "his" one tool for one job philosophy.

John Harshman said...

A related question, while I have your attention: Does anyone know of a program that will let you assign colors individually to particular branches or nodes?

One way to do this is to save a tree in PAUP or MacClade, import it into a graphics program, and hope that importation leaves the tree both undamaged and still represented as a group of line segments. In my experience, these hopes are seldom fulfilled. It would be nice to have a program that realized your graphic was a tree and still allowed you to manipulate it graphically.

Anyone?

Glor said...

@Harshman
You should be able to manipulate individual branches in a program like Adobe Illustrator if you export the tree image as a PDF file. You can also change colors of individual branches in FigTree. The best solution, however, may be R, which offers the ultimate flexibility in terms of tree manipulation.

Roderic Page said...

Personally for web-based tree viewing I'd like to avoid Java applets as much as possible (e.g., Archaeopteryx), although PhyloWidget is pretty nice.

I think we can do a lot just using HTML and Javascript. My own crude efforts in this regard (tvwidget, see example here), plus the JavaScript Infovis Toolkit (JIT) used by Rick Ree , show what is possible. Having seen JIT I think the <canvas> tag has a lot of potential.

Brian O'Meara said...

I recently added Archaeopteryx for tree visualization to my mirror of TreeBASE and found it pretty easy to do. I agree, though, that Java isn't as good as some other technologies, but for me, getting Archaeopteryx installed just seemed faster than other options (as the mirror was just something I threw together while procrastinating one day and will soon be obsolete, anyway).

John Harshman said...

Let me know if this is inappropriate, but I have tried exporting pdfs from FigTree into Illustrator. The tree itself comes through fine.

The problem here is with the taxon names, for which FigTree has three options. If you save them as shapes, they are imported as shapes; as advertised, but not useful. If you save them as type 3 fonts, Illustrator ignores them (telling you so when you open the file) and you get a naked tree. If you save them as type 1 fonts, Illustrator crashes; at least mine did.

On the bright side, branch coloring works just fine in FigTree.

I haven't attempted R yet.

Poletarac said...

John, if you are going to use Illustrator anyway (for whatever reason), it seems like you can get a lot done with R+ape and Illustrator. I generally favor not using illustrator at all, or minimally so.

But why use Illustrator after coloring branches in FigTree, other than for format conversion? If you wish to edit tree tips (spelling, font, etc.), this can be automated with a small script in R, and you can also view/manipulate the tree relatively easily. See the R Hackathon web pages for immensely useful info.

John Harshman said...

The goal is a poster, printed in one piece, for the upcoming CIPRES meeting.

Jon said...

@John. If you're working from (or have access to) a Mac, you might attempt print... save as .pdf from FigTree -- rather than 'exporting' as .pdf. I've had very good luck with being able to modify taxon-labels in Illustrator through these means.

John Harshman said...

Jon:

Thanks. My Illustrator won't read the resulting file at all (something about a problem with metadata), even if I first save it in Acrobat. But Canvas will, and reads the text as text. So I finally have a solution that lets me print the tree in readable form. (FigTree won't, by the way; it won't print more than one page, and the tree is way too big for that.)

The PDF format doesn't seem quite universally compatible, even within Adobe products. Go figure.

Anonymous said...

Hi,

Sorry I come late to this discussion.

Just an opinion: more than in how to print large trees, the problem is in how to extract the taxonomic information contained in them, or how to make the display easily focus on specific parts. And it seems to me it is not such a terribly difficult problem: some extremely simple solutions may be very helpful.

In that regard, I have just placed up in the TNT web site an update of the windows version of TNT which can handle large tree diagrams and taxonomies (before, this could be done only under 64-bit Linux or Mac). With this, you can look at the results of our recent analysis of 73,060 taxa (from Cladistics 25:211-230) with easy-to-use scripts, concentrating on specific parts of the tree (and automatically determining how close to monophyletic any reference group in the GenBank taxonomy is).

The full instructions to view our trees are here.

Except for those of you studying bacteria or viruses, you can be certain your favorite group is included in the tree --every major group is there.

printing catalog said...
This comment has been removed by a blog administrator.
Katie W. said...

For multi-page printing from FigTree or any other tree viewer that gives you single-page PDF output, I just discovered this nice little program called Tiler. There are others similar programs out there but this one is super basic, opens pdfs, and seems to work like a charm for my purposes: http://www.apple.com/downloads/macosx/system_disk_utilities/mindcadtiler.html

Glor said...

Thanks Katie!

Unknown said...

The most exciting feature for me is the duplication of sections near the margins for easy pasting of multi-page trees! evergreens