Capturing structured data to facilitate web revisions
Dave Roberts, Julius Welby, Markus Döring
Abstract
In order to write a taxonomic revision it is necessary for an author to assemble and consider the range of existing descriptions, bring them into a common framework (i.e., standardisation) and consider how well they form delineated groups. In general the existing descriptions are in free-text blocks with associated nomenclatural and relationship information usually laid out in a structured (formatted) manner. The EU project EDIT has devised a general information-flow structure to guide the development of tools to assist taxonomists in their work and to bring the products of taxonomic effort more efficiently to the broader user community.
From a sociological perspective we consider it essential to design ways of working that mesh seamlessly with the way taxonomists work now. To that end we have investigated the natural language application GoldenGATE as a means to add structure to both the free-text descriptions and the formatted nomenclatural elements of both published and new work. The primary intention is to capture content from manuscripts (word processor documents) rather than from published sources per se.
We will describe the information model that is guiding EDIT development and the advantage that structured data can offer in terms of increasing the efficiency of taxonomic workflow. Better tools to process taxonomic information are of significantly greater value if there is information to be processed. In other words, we need to establish a bank of structured content and demonstrate the benefits of working with structured data if we are to engage new users with this improved way of working. The goal is to motivate users to invest the effort required to understand and use structured data tools.
From a sociological perspective we consider it essential to design ways of working that mesh seamlessly with the way taxonomists work now. To that end we have investigated the natural language application GoldenGATE as a means to add structure to both the free-text descriptions and the formatted nomenclatural elements of both published and new work. The primary intention is to capture content from manuscripts (word processor documents) rather than from published sources per se.
We will describe the information model that is guiding EDIT development and the advantage that structured data can offer in terms of increasing the efficiency of taxonomic workflow. Better tools to process taxonomic information are of significantly greater value if there is information to be processed. In other words, we need to establish a bank of structured content and demonstrate the benefits of working with structured data if we are to engage new users with this improved way of working. The goal is to motivate users to invest the effort required to understand and use structured data tools.