An introduction to biological descriptions
Kevin Thiele
Abstract
Capturing biological descriptions in standard format is a hard problem; in fact, a claim will be made that descriptions are an order of magnitude harder than most other areas of biological standardization. This is because any standard must include both the descriptive data itself and an ontology (the descriptors) from which the description gains meaning. Descriptors are domain-specific – descriptors needed for beetles will necessarily be different from those for orchids (in fact, two beetle families may well require very different descriptors). Hence, an ontology for life is impossible – or is it? The principal challenge in descriptive standardization, as in all TDWG activities, is to create a standard which will allow descriptive data from a variety of sources to be combined, and to allow a bounded inference that “yellow” and “rounded” in descriptions from two sources mean the same thing. This in turn requires agreement on normativity and the capture of normativity statements. Once descriptive data are captured in standard form they can be used for many purposes, including the generation of structured natural language, interactive identification keys and research into the distributions and ecology of features of interest. The TDWG Structure of Descriptive Data standard goes a long way to solving the difficult issue of capturing descriptive data, but more work is needed.