Biodiversity Information Projects of the World

Automatic Biodiversity Literature Enhancement


Project Website
http://able.myspecies.info/
Project Description
ABLE is a collaboration between two UK based institutions, the Open University and the Natural History Museum, London. It aims to improve access to collections of scanned taxonomic documents.
We are developing tools to automatically mark up documents from existing large scale scanning projects, such as the Biodiversity Heritage Library (BHL). The scale of BHL, which is scanning pages at the rate of 600,000 a month has determined the need for automatic mark up.
We focus on extracting metadata (taxon, people and place names, and dates) and to enhance the searchability of those terms by using associative techniques from Natural Language Processing (NLP) to overcome errors in the text such as those introduced through Optical Character Recognition (OCR).
Contact
David (known as Dauvit) King
Research Associate
The Open University
Project Type
Facilitator
Project Language
English
Project Start Date
01-Oct-2008
Project End Date
23-Feb-2010
Key Inputs
JISC: funding as part of their Enriching Digital Resources programme
BHL: data as part of their digitisation programme
Key Infrastructure
EDIT Scratchpads (based on Drupal) for project communication and dissemination
Key Technologies
PHP, Java, Python, XML, XSL, XPath
Key Processes
Manual and automated XML mark up
Information extraction with fuzzy matching
Geographic Scope
Taxonomic Scope
Comments
Record Status
Information about this project is Complete.
Please log in if you want to have access to a form to update this record.

Back

Database field descriptions

  Last Modified: 02 June 2007