TAXAMATCH
- Project Website
- http://www.cmar.csiro.au/datacentre/taxamatch.htm
- Project Description
- TAXAMATCH is an algorithm (actually: a set of algorithms and rules used in combination) developed for fuzzy matching of taxon scientific names, useful for coping with either misspelled user input (e.g., via a web query interface), or misspelled stored data - for example, detecting near duplicate names in two data sets to be matched or merged so that inconsistencies can be rationalised, or for deduplication tasks within a pre-existing resource. TAXAMATCH combines a fast phonetic algorithm for specialised tasks (such as parallel testing, flagging high precision phonetic matches, and/or use in "rapid" variants of TAXAMATCH), with a modifed Edit Distance (ED) approach for detecting non-phonetic as well as phonetic errors. Both of these algorithms are custom developed for TAXAMATCH, initially as functions in the Oracle PL/SQL programming language, but with planned replication into other languages as part of the activities of a TAXAMATCH development community.
- Contact
- Tony Rees
Project Leader / principal developer
CSIRO Marine and Atmospheric Research, Australia
- Project Type
- Facilitator
- Project Language
- English
- Project Start Date
- 01-Aug-2007
- Key Inputs
- -
- Key Infrastructure
- Reference implementation presently implemented against the IRMNG database, access point http://www.cmar.csiro.au/datacentre/irmng/ .
- Key Technologies
- Version 1 of TAXAMATCH is initially available for Oracle databases and is written in Oracle PL/SQL programming language, other languages to follow.
- Key Processes
- Used for correction of misspelled user input to taxonomic database searches, recognition of near matches in multiple species lists, and deduplication (quality assurance/review) of content in existing systems.
- Geographic Scope
- Global - Global
- Taxonomic Scope
- Life - Applicable to names governed by any of the taxonomic codes (except cultivars and hybrids at this time)
- Comments
- A TAXAMATCH developers' wiki is available at https://wiki.csiro.au/confluence/display/taxamatch/ , username and password required (available on request from Tony Rees)
- Record Status
- Information about this project is Complete.
Please log in if you want to have access to a form to update this record.
Back
Database field descriptions