An ontological approach to describing and synthesizing ecological data, using a generalized model for “scientific observations”
Mark Schildhauer, Matthew Jones, Joshua Madin, Shawn Bowers
Abstract
Research in the ecological and environmental sciences increasingly relies on the integration of traditionally small, focused studies to form larger datasets for synthetic analyses. A broad range of data types, structures, and semantic subtleties occur in ecological data. This extreme heterogeneity makes discovery and integration of environmental data a difficult and time-consuming task. By formally defining the notion of "scientific observation", we have developed an ontology that captures the basic semantics of ecological data required for synthesis. Observations are distinguished at the level of entities (e.g., location, time, thing, concept); and the characteristics of those entities (e.g., area, height, color) are measured (quantified, named, or classified) as data.
Our framework permits observations to be inter-related via context (such as spatial or temporal containment), further enhancing the possibilities for comparison and alignment (e.g., merging) of heterogeneous data. Advanced forms of data discovery and integration are made possible through the use of a semantic annotation language that links observational constructs within the ecological data, to concepts that can be drawn from different domain ontologies (e.g., a biodiversity ontology, or ecosystems ontology). Our current framework accomplishes this by enabling a researcher to annotate metadata descriptions (such as in Ecological Metadata Language, EML) of individual datasets with appropriate ontological terms.
The generalized approach to modelling “scientific observations” using ontologies that link to metadata descriptions of the raw data provides a powerful, extensible mechanism for enhancing data discovery and integration, allowing scientists to address questions that were previously intractable. Prototype demonstrations of these capabilities are operational within the Science Environment for Ecological Knowledge (SEEK) research project, and are based on open-source technologies and standards for metadata and ontology construction, that are compatible with recommendations from the World Wide Web consortium.
Our framework permits observations to be inter-related via context (such as spatial or temporal containment), further enhancing the possibilities for comparison and alignment (e.g., merging) of heterogeneous data. Advanced forms of data discovery and integration are made possible through the use of a semantic annotation language that links observational constructs within the ecological data, to concepts that can be drawn from different domain ontologies (e.g., a biodiversity ontology, or ecosystems ontology). Our current framework accomplishes this by enabling a researcher to annotate metadata descriptions (such as in Ecological Metadata Language, EML) of individual datasets with appropriate ontological terms.
The generalized approach to modelling “scientific observations” using ontologies that link to metadata descriptions of the raw data provides a powerful, extensible mechanism for enhancing data discovery and integration, allowing scientists to address questions that were previously intractable. Prototype demonstrations of these capabilities are operational within the Science Environment for Ecological Knowledge (SEEK) research project, and are based on open-source technologies and standards for metadata and ontology construction, that are compatible with recommendations from the World Wide Web consortium.