Proceedings of TDWG, 2008

Integrating space, time and form; the challenge of developing standards for geophylogenies.

David Kidd

Abstract


Organisms and their genes evolve across space as well as through time, yet the trees and reticulate graphs with which we represent evolutionary relationships do not currently incorporate the spatial component. Geophylogenies are spatially referenced phylogenetic graphs that explicitly represent spatiotemporal evolutionary hypotheses which can be stored, visualized, and analysed within geographical information systems and Earth browsers (e.g. Google Earth and NASA WorldWind). Future data standards for geophylogenies thus require the integration of phylogenetic, geographic, and temporal standards, with the added complication of continental drift decoupling the connection between geographical place and coordinate systems.

Simple geophylogenies can be created by attaching spatial coordinates to the nodes of a phylogenetic model and then placing the branches so that they follow the shortest geographical path between nodes. If the spatial coordinates refer to an orthogonal geographic datum (coordinate system) then the shortest path is a straight line. However, if the datum refers to a spheroid then the shortest distance between two points will follow a great-circle and the branch path needs to be represented by an arc or segmented line (a polyline; Fig. 1). Segmented branches are also required if the shortest path is not the most appropriate path, for example where an impassable barrier separates a pair of sister taxa.

Geophylogenetic nodes may be appropriately represented as a simple single point (x,y,t), for example where graph nodes represent widely dispersed populations. However, such situations are perhaps the exception. Often entities are observed at more than one location or are hypothesised to occupy a certain geographical range, e.g. species and gene ranges. In such cases, nodes may be better represented as a set of point locations, a polyline (e.g. river reaches) or a polygonal geographic region (e.g. species range). As ranges change through time independently of the distribution of phylogenetic nodes, inferred historic ranges may need to be attached to the vertices that separate branch segments as well as nodes.

Modern geographic datums are suitable for shallow-time but as time deepens and continental drift breaks the one-to-one mapping between place and datum coordinates current coordinates will need to be transformed to palaeo-coordinates. The palaeo-model on which such transformations are made, as well as the temporal datum (stratigraphic or radionucleotide) from which dates have been obtained, must be part of any standard.

The geophylogenetic data model is a new development and as such can take advantage of existing and emerging standards, e.g. phylogenetic data in xml (NexML), PhyloXML and the Geography Mark-up Language (GML). If embraced by the scientific community, geophylogenies have considerable potential to reinvigorate historical biogeography and will underlie any future Map of Life in which the Tree of Life is threaded through earth history.