Thursday, February 14, 2013

Marine Klee-diagrams (1)

Paul Klee - Ancient Sound
I have touched on the topic before. While methods for phylogenetic visualization have typically been developed for small phylogenetic trees, new methods are required as efforts to resolve the tree of life proceed and sequence datasets grow. Many common software tools for visualising small phylogenetic trees already exist. These tools lay out trees in a two-dimensional Euclidean space and are useful for visualising trees of up to a few hundred nodes. Some software tools (e.g. Figtree, HyperTree ), have increased the number of depictable nodes using 2D hyperbolic space or visualising trees in 3D hyperbolic space (Walrus).

However, the larger datasets become, the more difficult it will be to provide a panoramic view of the entire dataset.  Indicator vectors of sequences visualized in so called Klee-diagrams have the potential to overcome this caveat of tree methods. I like to show a few examples here on the blog to demonstrate the power of Klee-diagrams displaying larger DNA Barcode datasets.

For my Klee-diagrams I utilized a mathematicalapproach to comparative analysis of nucleotide sequences using digitaltransformation in vector space. Essentially DNA data are transformed into vectors. A distinguishing vector which is indicative of a specific group of organisms can be calculated based on the transformed DNA sequence information (the number n of members in such a group can be defined). These so called indicator vectors can be constructed based on different taxonomic levels or other interesting groupings. All this is implemented in a MatLab routine available at Mark Stoeckles barcode site. Matrices of correlations among the indicator vectors can be displayed as false-color maps (Klee-diagrams) using the MatLab graph functions. Note that the input could also be any other value set e.g. a distance matrix. 

While the order of sequences in an alignment does not affect the actual calculations, for the resulting Klee diagram it is useful to arrange the sequences to approximate evolutionary relationships. Therefore, I organized the data based on the topology of Neighbor Joining trees (here constructed with MEGA 5). The re-ordering of my alignments was conducted with a customized Tree Parser routine

Over the last couple of years I dedicated a large amount of my time to marine DNA Barcoding and it seems logical to use some of the data we collected over 4 years to showcase the Klee-diagrams. By the way, all figures are hyperlinked (the captions) and available via figshare.
Klee-Diagram depictingindicator vectors (n=3) for 5000 marine species DNA Barcodes representing 10 phyla.

The figure shows a Klee-diagram that was constructed using marine COI barcodes publicly available on BOLD. Blocks of high correlation on the diagonal reflect affinities within groups of species, corresponding to taxonomic divisions.

Major marine groups are clearly separated in the diagram. While COI usually fails to resolve intermediate taxonomy it performs surprisingly well to resolve the major marine phyla in this dataset. Rapidly evolving sites appear saturated while more constrained sites are sufficiently variable to be phylogenetically useful. Thus, it could be argued that the level of divergence of genetic relationships examined here is for the most part located in windows in which rapidly evolving sites are too saturated and slowly evolving sites are variable enough to provide phylogenetic signal on two levels.

Klee diagrams utilizing only one gene fragment cannot replace in-depth phylogenetic multi-gene analysis but it is conceivable that heat-map based visualization can overcome the inadequatenesses of large scale trees. Topologies generated through complex multi-gene algorithms could be translated into such diagrams as well.

An advantage of the method is its scalability. The figure below depicts a comparison of two groups of marine invertebrates – echinoderms and polychaetes – based on the COI gene. DNA Barcoding has been proven to be an effective, accurate and useful method of species diagnosis for all five classes of Echinodermata . In addition our Klee diagram reveals discontinuities corresponding to higher-level taxonomic divisions (left diagram). Furthermore, some areas of high correlation are indicative of species groups that exhibit low barcode divergence due to rather recent speciation events.
Klee diagrams for 2groups of marine invertebrates. a. Indicator vector correlation (n=2) for 560echinoderm species. b. Indicator vectors (n=2) for 375 polychaete species. 

Indeed some of the crinoid species (in the left diagram, upper left block) have been recently identified as Antarctica’s first example of a marine invertebrate species flock. Similar species complexes are discussed for the asteroid genus Henricia which is also shown the diagram (left diagram, position 170-195).

The diagram on the right shows that COI is not able to resolve the major groups within the polychaeta. Traditionally, 18S rRNA has been used to provide phylogenies that resolve the divisions within the polychaetes . Many species thought to have broad distributions turned out to be a complex of allied species and that this often rather reflects the limitations of conventional taxonomy than actual cosmopolitanism. Also polychaetes in general are thought to be paraphyletic and the lack of distinctness in the diagram might reflect an overall unresolved taxonomy. However, it needs to be pointed out that the method used to calculate the indicator vectors is based on the rather arbitrary grouping by species identifications which could mask true diversity patterns in some cases.

...to be continued

1 comment:

  1. Starting a Wikipedia article on these ideas - would be nice if you could weigh in at https://en.wikipedia.org/wiki/Klee_diagram

    ReplyDelete