Proceedings of TDWG, 2007

SPM from an SDD perspective: Generality and extensibility

Gregor Hagedorn

Abstract


The current Species Profile Model (SPM) is urgently required because of the need to integrate descriptive data with the new TDWG models based on semantic web technologies.

The Structure of Descriptive Data (SDD) model (standardized by TDWG in 2005) is not, so far, based on semantic web technologies, but has been developed explicitly to integrate a wide range of data types, from categorical ("character states") to quantitative (including statistical measures like average, variance, sample size, etc.). It offers methods to deal with thousands of characters and other terms, has been designed to support federated systems, and contains an extensible data type system.

Most importantly, SDD succeeds in integrating: (a) natural language markup (ranging from breakdown into “subject fields”, as considered by SPM, to detailed markup of concepts, characters and values in legacy literature); (b) coded descriptions (matrices as known from the DELTA or NEXUS encoding systems for taxonomic descriptions); and (c) original sample data.

SPM, while aiming for simplicity, has already started to become complicated, expanding from natural language text into structured data such as categorical and quantitative measurements. With respect to the sequence of data type and semantics, the approach taken by SPM seems to be the opposite of SDD. It is difficult to judge how complex the model will become as it develops further, and how well it will scale if dealing with thousands of descriptive concepts.

Important topics for investigation include (a) how SPM can be extended to support multiple descriptions of different scope per taxon (e.g., geographic, seasonal, stage-specific); (b) how an extensibility of categorical values through modifiers (“perhaps”, “frequently”, “at tip”, “in winter”) or free-form text comments can be added; (c) how a rich vocabulary of statistical measures can best be introduced; and (d) how issues of free form text markup (especially if arranged differently than the current SPM categories) and sequence versus set data can be addressed.

It is hoped that a merger of the requirements of SDD and SPM will be possible to maintain a common platform for future development.