Challenges and tradeoffs in the management of geological context data in paleontological collections.
Paul J. Morris
Abstract
Paleontologists collect fossils from exposures of rocks on the Earth's surface and by other means such as drill cores. As these fossils are curated into museum collections, information systems need to handle the geological context from which the fossils were collected. The geological context for a fossil may include the geologic time unit from which it was collected, a description of the rocks from which it was collected, the formal (lithostratigraphic) name of the rock unit from which it was collected, and a wide variety of geologic zones that represent fine-grained hypotheses about the ordering of events in geologic time. There are multiple different ways to handle these data in relational databases and in data interchange standards, with tradeoffs between different choices. I will examine several different approaches to the management of information related to the geological context for paleontological specimens and the tradeoffs that result from different choices.
In abstract terms, the geological context of a fossil is an attribute of the place or locality from which it was collected. In practice, geologists think of a locality as a place in a two dimensional coordinate system where a slice of geologic time is exposed, and where multiple different rock units and geologic time units may be exposed and collected from that locality. Treating geological context as an attribute of a locality may result in the entry of redundant locality records that differ only in their geology, and may create problems for the management of legacy data with poorly resolved geographic and geologic data. Geological context may be treated as an attribute of a specimen, and this may work well for capture of legacy data and collections where material enters as bulk samples that are tracked and processed, but may make it very difficult to manage changes in hypotheses about the geological context of an outcrop or a drill core. Geological context can be treated as an attribute of a collecting event. Collecting events are often unknown for legacy data. Tying geological context to a collecting event creates challenges for cleaning and editing legacy data by raising the possibility of introducing errors that propagate through many related specimens, however, the nature of a collecting event as a sampling visit in time for a locality may make it the best fit for geological context information.
A geological context may be thought of as a single objective attribute related to a specimen, or as a hypothesis that can change over time. Large scale geological information (placement in a high level geologic time unit, or in a high level rock unit), is not likely to change over time, whereas fine scale divisions of geologic time and rocks are hypotheses that are more likely to change. The relationship of a specimen to a geological context can be thought of as a determination made at some point in time, with a changing history as the placement of boundaries of time and rock units change. Treating a relationship between a collecting event and a geological context as a one to many relationship makes for less complexity (in databases, code, exchange standards, and user interfaces) than does treating it as a many to many relationship, and there may or may not be enough data of high enough quality to merit tracking histories of changes of the geological context of a specimen.
In abstract terms, the geological context of a fossil is an attribute of the place or locality from which it was collected. In practice, geologists think of a locality as a place in a two dimensional coordinate system where a slice of geologic time is exposed, and where multiple different rock units and geologic time units may be exposed and collected from that locality. Treating geological context as an attribute of a locality may result in the entry of redundant locality records that differ only in their geology, and may create problems for the management of legacy data with poorly resolved geographic and geologic data. Geological context may be treated as an attribute of a specimen, and this may work well for capture of legacy data and collections where material enters as bulk samples that are tracked and processed, but may make it very difficult to manage changes in hypotheses about the geological context of an outcrop or a drill core. Geological context can be treated as an attribute of a collecting event. Collecting events are often unknown for legacy data. Tying geological context to a collecting event creates challenges for cleaning and editing legacy data by raising the possibility of introducing errors that propagate through many related specimens, however, the nature of a collecting event as a sampling visit in time for a locality may make it the best fit for geological context information.
A geological context may be thought of as a single objective attribute related to a specimen, or as a hypothesis that can change over time. Large scale geological information (placement in a high level geologic time unit, or in a high level rock unit), is not likely to change over time, whereas fine scale divisions of geologic time and rocks are hypotheses that are more likely to change. The relationship of a specimen to a geological context can be thought of as a determination made at some point in time, with a changing history as the placement of boundaries of time and rock units change. Treating a relationship between a collecting event and a geological context as a one to many relationship makes for less complexity (in databases, code, exchange standards, and user interfaces) than does treating it as a many to many relationship, and there may or may not be enough data of high enough quality to merit tracking histories of changes of the geological context of a specimen.