Global Compositae Checklist: Integrating, Editing and Tracking Multiple Datasets
Christina Flann, Aaron Wilton, Kevin Richards, Jerry Cooper
Abstract
The Global Compositae Checklist is an ambitious project aiming to integrate existing electronic data sources for one of the largest plant families in the world to provide definitive nomenclatural information and up to date taxonomic concepts. Purpose-built Checklist Software has been designed and developed by Landcare Research in New Zealand. The Checklist Software 1) imports multiple existing datasets, 2) integrates datasets together using rules, and 3) provides a transparent digital audit trail for the integration process and subsequent manual annotation and editing. Datasets have been contributed from many different providers from major botanical institutes around the world. Datasets are imported via TCS (Taxon Concept Schema standard), one of the first real uses of this TDWG standard, or a defined fixed MS Access format. The provider records are then integrated into the Checklist using a simple algorithm that tests each record for possible existing matching records. A tool for matching variant author abbreviations is included in the Checklist Software. This can continually be updated with the correct abbreviation. Any given variant should be referred to, allowing the matching of names with varying author citations. Using these methods ‘provider’ records are linked to a new or to an existing consensus record. The nomenclatural data are then verified by an editor, with any edits being added as an editor’s provider record. Following each integration or edit the data for the consensus record are re-calculated using a majority consensus from the linked provider records to determine the value in each field calculated. However, this simple majority is overridden by an editor’s record which has priority over other provider records. Taxonomic concepts are also included when they are included in the data sets by data providers and are integrated following the same principles as for the nomenclatural data. The data from the Checklist will be available through the Checklist website and via a TCS mediated web-service.