Proceedings of TDWG, 2007

TOQE - A Thesaurus Optimized Query Expander

Niels Hoffmann, Patricia Kelbert, Pepe Ciardelli, Anton Güntsch

Abstract


Websites and web portals allow users to find information on unit data. Unfortunately, most search engines only return data literally matching search terms, and do not look for concept-related data.

At the same time, there are already several thesauri available online, such as taxonomic checklists, country lists, etc. These thesauri allow users to retrieve a list of concept-related elements for a given term or a given concept.

As long as web portals are not integrated with these available thesauri, the user will find searching frustrating and imprecise. A thesaurus integrated into the search engine would make searching much more efficient. A search engine equipped with a thesaurus interface would first query a thesaurus for concepts related to the given search term, and then perform the original query "expanded" with the results from the thesaurus.

Such an interface should include operators to retrieve:
- a list of relationship types or methods implemented by the thesaurus and
- a list of semantically-related concepts for a given search term.

The service should also be able to connect to any kind of structured thesaurus database, such as XML Topic Maps (XTM), Simple Knowledge Organisation Systems (SKOS), Relational Database Management Systems (RDBMS), etc.

The Thesaurus Optimized Query Expander (TOQE) has been implemented as a web service. TOQE provides the client with a fixed set of methods, thereby hiding the complexity and structure of the underlying thesaurus. The service transforms requests into queries applicable to the underlying thesaurus. Results are then transformed into a well-defined XML schema and returned to the client. The process of querying a thesaurus becomes transparent, generic, and independent of the thesaurus used.

TOQE is already used by the SYNTHESYS portal for access to specimens and observations (http://search.biocase.org/synth-ui), and with the Euro+Med taxonomic checklist as the thesaurus database. The web service can be accessed at http://ww3.bgbm.org/toqe/.