Proceedings of TDWG, 2008

Data Exchange Standards – The Case for Being Stupidly Simple

Chuck Miller,

Abstract


Why are Biodiversity Information Systems (TDWG) data exchange standards so complex? The schema based standards like Structured Descriptive Data (SDD), Access to Biological Collections Data (ABCD), and the Taxonomic Concept transfer Schema (TCS) can involve hundreds of nodes, making mapping to data sources a difficult task. Extensions to deeply layered ontologies and Resource Description Framework (RDF) triples are adding even more complexity. Although there are sound reasons for full analysis and detailed, complete accounting of all possible data values and structures, there are counterbalancing reasons to also consider simpler structures. The majority of institutions with biodiversity data to share do not have the skilled informatics staff needed to implement complex data interfacing techniques. Furthermore, they may lack the knowledge to understand even the descriptions of the interfacing techniques. TDWG runs the risk of producing products that are so complex they further isolate rather than capture a large segment of the biodiversity data available.

Can we relearn the lesson of Darwin Core (DwC)? Despite its limitations, multiple versions in use, difficulties as an XML Schema, and, some say, lack of scalability, it is ostensibly the single most used biodiversity data exchange standard in the world, exchanging over 140 million records from 3,000 datasets within the GBIF network alone. Why? It’s simple and linear. Although its simplicity necessarily means information is missing, it still provides the essential facts about specimens and occurrences. And being simple and linear it can be implemented with basic informatics knowledge. The lesson from Darwin Core is that we need simple and linear exchange standards for all of the categories of data: concepts, names, descriptions, and more. Not instead of TCS, SDD or ABCD but in addition to and integrated with them. This presentation will suggest some approaches to a simpler side of TDWG.