Proceedings of TDWG, 2006

A New Model for Descriptive Knowledge

Antoine Chalubert, Régine Vignes Lebbe

Abstract


The description of biological entities, such as species and other taxonomic groups is the base of most biological knowledge. To be able to compare taxonomic descriptions is essential to analyze, classify and identify. A formal representation of descriptive knowledge is needed in order to justify the relevance of a particular algorithm. In general, this justification is provided by the implemented method itself, for example, a matrix of taxa by characters for phylogenetic analysis. However, these methods do not provide an explicit and complete knowledge representation that can express all the meanings of "character" that can be found in systematic literature. Another consequence of these partial representations is the impossibility of integrating or combining various methods using the same knowledge base (e.g. identification and phylogenetic analysis).

Our aim is the development of an extensive data and knowledge-processing platform for systematics, integrating taxonomy as well as identification and phylogenetics. Our proposal is the continuation of previous works on knowledge base editor and computer-aided identification (KB-CAI) computer software like XPER, NEMISYS, DELTA, IKBS and the proposals of the working group SDD (Structure of Descriptive Data). All recent results offer a limited formalism, restricted to treatments of descriptive data: they deal with the descriptions of the properties of objects but cannot handle knowledge related to organisms such as their structural and anatomical description (see Pullan et al., The Prometheus Description Model: an examination of the taxonomic description-building process and its representation). They can manage the polymorphism of objects but do not allow any estimation of the reliability of the data or its traceability. They have reduced extension possibilities into different character types than initially considered and do not allow the descriptions to be modified for example, because a new type of data is available. They do not provide any complementary contextual information (such as the required proficiency, the conditions of observation). Or their methods do not allow the conversion from one type of information to another (e.g. numerical to qualitative).

We propose a new model for descriptive data that tries to address these failings. We show that our model allows innovative representation of concepts and treatments, an extensive pool of state character types, the representation of complex anatomical descriptions, phylogenetic analysis and control of the reliability of data and user proficiency level. Our proposal is partially implemented in the computer program KB-CAI. This software is being improved to build a complete framework for computer-aided systematics.