Automatic Biodiversity Literature Enhancement
- Project Website
- http://able.myspecies.info/
- Project Description
- ABLE is a collaboration between two UK based institutions, the Open University and the Natural History Museum, London. It aims to improve access to collections of scanned taxonomic documents.
We are developing tools to automatically mark up documents from existing large scale scanning projects, such as the Biodiversity Heritage Library (BHL). The scale of BHL, which is scanning pages at the rate of 600,000 a month has determined the need for automatic mark up.
We focus on extracting metadata (taxon, people and place names, and dates) and to enhance the searchability of those terms by using associative techniques from Natural Language Processing (NLP) to overcome errors in the text such as those introduced through Optical Character Recognition (OCR).
- Contact
- David (known as Dauvit) King
Research Associate
The Open University
- Project Type
- Facilitator
- Project Language
- English
- Project Start Date
- 01-Oct-2008
- Project End Date
- 23-Feb-2010
- Key Inputs
- JISC: funding as part of their Enriching Digital Resources programme
BHL: data as part of their digitisation programme
- Key Infrastructure
- EDIT Scratchpads (based on Drupal) for project communication and dissemination
- Key Technologies
- PHP, Java, Python, XML, XSL, XPath
- Key Processes
- Manual and automated XML mark up
Information extraction with fuzzy matching
- Geographic Scope
-
- Taxonomic Scope
-
- Comments
-
- Record Status
- Information about this project is Complete.
Please log in if you want to have access to a form to update this record.
Back
Database field descriptions