Biodiversity Information Projects of the World

TAXAMATCH


Project Website
http://www.cmar.csiro.au/datacentre/taxamatch.htm
Project Description
TAXAMATCH is an algorithm (actually: a set of algorithms and rules used in combination) developed for fuzzy matching of taxon scientific names, useful for coping with either misspelled user input (e.g., via a web query interface), or misspelled stored data - for example, detecting near duplicate names in two data sets to be matched or merged so that inconsistencies can be rationalised, or for deduplication tasks within a pre-existing resource. TAXAMATCH combines a fast phonetic algorithm for specialised tasks (such as parallel testing, flagging high precision phonetic matches, and/or use in "rapid" variants of TAXAMATCH), with a modifed Edit Distance (ED) approach for detecting non-phonetic as well as phonetic errors. Both of these algorithms are custom developed for TAXAMATCH, initially as functions in the Oracle PL/SQL programming language, but with planned replication into other languages as part of the activities of a TAXAMATCH development community.
Contact
Tony Rees
Project Leader / principal developer
CSIRO Marine and Atmospheric Research, Australia
Project Type
Facilitator
Project Language
English
Project Start Date
01-Aug-2007
Key Inputs
-
Key Infrastructure
Reference implementation presently implemented against the IRMNG database, access point http://www.cmar.csiro.au/datacentre/irmng/ .
Key Technologies
Version 1 of TAXAMATCH is initially available for Oracle databases and is written in Oracle PL/SQL programming language, other languages to follow.
Key Processes
Used for correction of misspelled user input to taxonomic database searches, recognition of near matches in multiple species lists, and deduplication (quality assurance/review) of content in existing systems.
Geographic Scope
Global - Global
Taxonomic Scope
Life - Applicable to names governed by any of the taxonomic codes (except cultivars and hybrids at this time)
Comments
A TAXAMATCH developers' wiki is available at https://wiki.csiro.au/confluence/display/taxamatch/ , username and password required (available on request from Tony Rees)
Record Status
Information about this project is Complete.
Please log in if you want to have access to a form to update this record.

Back

Database field descriptions

  Last Modified: 02 June 2007