Proceedings of TDWG, 2008

DataNetONE: A Distributed Environmental Data Archive

Matthew B. Jones

Abstract


Ecological, environmental, and earth science studies that elucidate the processes driving biodiversity are fundamentally important to science-based management of the natural world. Understanding the patterns and processes that control biodiversity is critical to the long-term management of our world's ecosystems, and is accomplished through scientific programs that collect data on the distribution and abundance of organisms and the natural environment in which they exist. Currently these data are collected by a wide variety of academic and governmental agencies and are distributed throughout the world with no mechanism for broadly federating them to achieve accessibility. We have proposed to build the DataNet Observation Network for Earth (DataNetONE), a global, distributed environmental data archive spanning the environmental sciences that will make the wealth of scientific observations on the earth available for scientific studies. DataNetONE has been proposed by a consortium of institutions that are active in scientific data archiving, including ecological research centers, field station networks, government agencies, and libraries.

DataNetONE will consist of a large number of geographically distributed Member Nodes that house data archives and metadata describing those data. Member Nodes will be linked together by contributing metadata to a series of replicated Coordinating Nodes that provide valuable services to Member Nodes. The Coordinating Nodes provide a common infrastructure to handle, for example, distributed authentication, fault tolerance, geographic, taxonomic, and temporal search services, and data replication services. The Coordinating Nodes will monitor the health and accessibility of the Member Nodes in order to create a carefully architected replication service by which Coordinating Nodes will ensure that all data are replicated among geographically separated institutions. This replication service will ensure the long-term preservation of data and make the data accessible in different areas of the world.

DataNetONE is an ambitious project, and will only succeed through careful planning and widespread support for its mission. A major focus will be on the sustainability of DataNetONE, with the goal that the network has both the financial and technical means to be self-sustaining after ten years. This sustainability goal is critical to the preservation of data that can be used to understand biodiversity and our natural world.