Biological Descriptions (BD) Interest Group Charter

Convener

Gregor Hagedorn,
Biologische Bundesanstalt fur Land und Forstwirtschaft, Konigin-Luise-Str. 19, 14195 Berlin, Germany.
Email: g.m.hagedorn(at)gmail.com

Core Members

  • Gregor Hagedorn (BBA, Germany)
  • Robert Morris (UMass, Boston, USA),
  • Kevin Thiele (Western Australian Herbarium, Perth, Australia),
  • Bryan Heidorn (University of Illinois, USA)
  • Donald Hobern (GBIF, Denmark)

Summary

The goal of the group is to develop standard computer-based mechanisms for expressing and transferring descriptive information about biological specimens, taxa, and similar entities such as diseases or ecosystems. The exchanged data may include terminologies, descriptions, identification data and associated resources (taxon names, specimen and publication references, etc.). The term "descriptive data" as used here is about descriptions of life on earth and limited to inherent (taxon specific) properties or traits. It is not about legal, management, event, nomenclatural, or other data that may be relevant in the study of biodiversity.

The developed standards are developed to support capture, transport, caching and archiving of descriptive data, using platform- and application-independent means. Such a standard is crucial to enabling lossless porting of data between existing and future software platforms including identification, data-mining and analysis tools, and federated databases.

Motivation

This group is concerned with the management and exchange of observations or measurements of the taxon-specific properties of free-living or collected organisms or other biological entities (bacteria, viruses, diseases, etc.). Taxon-specificity implies that the data are at least partly genetically coded (although the expression of these properties may be modified by the environment or circumstances). Such data are relevant in many contexts; the primary motivation of many members of the groups is the identification of taxa in a taxonomic, ecological, educational, recreational, or applied context. In addition, these data are highly relevant for genetic (studying the relation between genes and their expression) and phylogenetic (studying evolutionary relationships) studies.

All descriptive data originate in observations and measurements of biological individuals (observations, specimens, cultures, etc.). These data may be aggregated and synthesized into descriptions of infraspecific, specific, or higher taxa. Taxon descriptions are much more commonly used for identification and phylogenetic analysis than descriptions of individuals ("specimen descriptions"). However, a goal of this interest group is to unify these forms of descriptions so that individual and taxon descriptions can be integrated, and individual data cited for or aggregated into taxon descriptions. Descriptive data take a number of different forms, which are considered the core objects of this group:
  • Free-form natural-language descriptions are semi-formalized descriptions of a taxon, or occasionally of specimens. They may be simple, short, and written in plain language (e.g. as found in popular field guides), or long, formal, and using specialized terminology (e.g. as found in taxonomic monographs or other treatments).
  • Semi-structured natural language data are free-form text descriptions that are improved by manually or automatically adding information to it ("tagging" or "markup").
  • Branching ("dichotomous" or "polytomous") identification keys are specialized identification tools comprising fragments of descriptive data arranged in so-called "couplets" to form a branching tree. Each fragment (traditionally called a "lead") usually comprises a small natural-language description using one or several characters, pointing either to the next node in the key or ending with a taxon name. In contrast to the multi-access keys that may be based on coded descriptions (see below), branching identification keys require users to follow a fixed sequence of questions.
  • Coded descriptions comprise highly structured data used in computer programs for biological identification (especially multi-access identification keys) or for phylogenetic analysis. Coded data may include original (sample) data as well as aggregated summary (taxon) data.
Descriptive data therefore range from unstructured natural language to highly structured (coded) data. Unlike specimen databases and name services, taxon descriptions and identification keys often reside in dispersed and independent documents, publications, or small repositories. These are normally provided by individual scientists or small teams and only rarely by large organizations.

Each set of descriptive data typically uses its own independent biological terminology (also called an "ontology"). Data standards for descriptive data must be designed to accommodate these independent ontologies, but also provide means to support (voluntary) standardization of ontologies. The goal of the standards created by this group is to support the capture, transport, caching and archiving of descriptive data in all the forms mentioned above, using a platform- and application-independent international standard. Such a standard is crucial to enabling lossless data exchange between existing and future software platforms, including identification, data-mining and analysis tools, and federated databases.

A first attempt for such a standard is the result of the SDD Schema task group (see charter of schema task group). We expect that the development of standards like the SDD xml-schema will contribute to the development of structured online monographs or species pages that include descriptive data as well as other biodiversity data.

The Interest Group maintains a repository containing discussions, decisions, schemas, sample documents, and selected open-source tools for the support of the use of SDD.

Becoming Involved

  • All interested parties are encouraged to become involved with the interest group or specific task groups. Membership of these group is open. If you are involved in biological identification, creation or editing of digital flora or faunas, taxon descriptions, etc. and you would like to share your data with others, SDD may need your contribution.
  • If you plan a software development involving exchange of descriptive data, please contact a core member to determine if we have or know about any code or exchange standard that may help your project. This will help to use common solutions where these exist, point out omissions not yet recognized, or help prioritize known omissions.
  • If you detect problems with using the SDD Schema, please document them, either directly on the SDD Wiki (http://wiki.tdwg.org/twiki/bin/view/SDD), or submit your critique by email to become placed on the Wiki on your behalf.
  • If you believe a new task group is needed to address specific developments with SDD, please contact any of the Core Members about starting a new Task Group. SDD is not limited to the current XML-schema based effort!
  • The group is particularly interested in attracting people with experience in the use of Semantic Web technologies. Please contact the convener or any core member about how to become involved.

History and Context

History and context of SDD: http://wiki.tdwg.org/twiki/bin/view/SDD/HistoryAndContext

Resources


  Last Modified: 26 November 2006