Proceedings of TDWG, 2008

The Significance of Observations to Biodiversity Studies.

Steve Kelling

Abstract


As the types of data included in biodiversity clearinghouses expands outside of the traditional realm of natural history collections opportunities and challenges arise. More data provides a greater opportunity for synthetic analysis across broad spatial and temporal landscapes, but since these data are collected in different ways more care is required in how data are repurposed.
The bird monitoring community classifies species occurrence data by protocol, project design, and data analysis opportunities and this approach may be more broadly applicable. Protocols are an essential part of species occurrence data because they define the context in which the data were collected, and facilitate the combination of observations made by multiple participants in many locations. Protocols directly influence project design and analysis, which can be classified into 3 general categories:
Directed surveys used when a priori knowledge of a given system or biological mechanism already exists. The design attempts to control for known sources of variation, while sampling one or a few well defined variables. As such, directed surveys are the form of observational data collection that closest resembles experimental studies.
Broad-scale surveys generate probabilistic estimates of species occurrence. They do not provide direct evidence, but allow inferences for the causes of species occurrence. Broad-scale surveys gather tens of millions of observations annually and provide the bulk of non-specimen observational data available.
Biological collections are zoological, botanical, and paleontological specimens in museums, living collections in botanical or zoological gardens, or microbial strain and tissue collections. They are the foundation for taxonomic and historic occurrence of species. While most use of specimen collections has been for taxon-oriented research, they have been used for predictive modeling of species occurrence.
Each category of species occurrence data has their issues. Directed surveys are expensive and conducted on small spatial and temporal scales. Broad scale surveys often gather data opportunistically, rely on volunteer participation, and use protocols that are less stringent. Biological collections often provide presence-only data with significant and often undocumented sampling biases.
While important inroads have been made in organizing broad scale surveys and biological collection data via standardized schemas, few directed surveys have been added to data clearinghouses. They represent a huge untapped biodiversity data resource, but integrating them with broad scale surveys or biological collections will require increased information provided at the individual record level. For example, the explanatory factors needed to understand the processes (i.e. protocol design and sampling plan) used to collect the data must be included. Consequently, individual data records will need to include both biologically-relevant factors that affect organisms' distribution and abundance as well as information on the factors that impact the data collection process. With sufficient additional record-level data, potential sources of bias may be investigated and be accounted, thus increasing the significance of the analysis or visualization process.