Proceedings of TDWG, 2007

Building an index of all genera: A test case in interchange

David P. Remsen, David J. Patterson

Abstract


A challenge facing the Global Biodiversity Information Facility (GBIF), the Encyclopedia of Life (EoL), and other initiatives that manage large amounts of species information is in organizing the data in a way that makes biological sense. Data mobilized within these initiatives are associated with taxon names, many of which are no longer accepted or may be misspelled. The GBIF ECAT program draws upon the Catalog of Life (CoL) as the major component of a taxonomic infrastructure but cannot effectively assess names not present in the catalogue. The primary goal of the CoL is to compile the currently accepted names of living taxa, not to organize all taxon names associated with past and present biodiversity data. Thus, additional components of the organizational infrastructure are needed to complement, not complicate, existing efforts.

One component is a catalog of all biological genera. Genera provide potential organizational value in species data because a genus name is a component of every species combination. A complete catalog of all genera offers a number of compelling benefits from both an organization as well as a referent biological perspective. First, it provides the means to identify and quantify all generic-level homonyms. Such a compilation may prevent future homonyms. The inverse identification of all uniquely spelled genera is also valuable as it implies that any species combination referencing that genus name was unambiguously assigned to it. Assigning all genera to a provisional consensus taxonomic position is a relatively accessible ambition, already underway. A genus provisionally placed within a higher taxon links all associated combinations with that higher group. Such a structure can serve useful disambiguation functions for taxon references that are otherwise undifferentiated. Coupled with high-quality data mobilization efforts, like the Biodiversity Heritage Library, it enables taxon experts, for example, to be alerted to previously unknown taxon references in their area of expertise.

In order to achieve this index, the All Genera Index (AGI) must coordinate with a number of existing nomenclatural initiatives that already catalog large subsets of generic names. It must also reconcile overlap, where it exists, which requires informed lexical comparison algorithms that can confidently distinguish spelling variation in genus-author combinations from true homonyms. Interchange is bidirectional, as the index itself may serve as a point of origin for previously uncatalogued names. All of this is dependent upon access to flexible standard messaging systems that enable the exchange and synchronization of both verified and unverified nomenclatural records between systems that may employ different implementation mechanisms. Such an exercise, focused on genera and a small number of providers, will serve as a useful test case prior to attempting this on a larger scale with species checklists.

The AGI will serve a useful function as a staging interface between authoritative nomenclators and currently relevant digitization activities such as the Biodiversity Heritage Library that will undoubtedly uncover novel taxon references with name combinations currently outside indexed compilations. The AGI will not only provide a provisional repository for such unverified records it will serve a vital organization role to support the verification and assembling of a complete catalog of taxa and associated names.