Proceedings of TDWG, 2009

Landscape of the information standards for plant genebanks

Theo van Hintum

Abstract


Genebanks, conserving plant genetic resources (PGR) for use in plant breeding and crop research, have been around since the 1960’s. Documentation was done in books, on cards, and other local solutions. Since the computer became accessible, these systems were converted to, in the best case, simple databases, but more often spreadsheets. All these systems had their own local solutions in regards to the structure of the data and the coding used in it. It was only when the genebank community started exchanging data that the need for standardization arose. The first data being exchanged, and so far the only, were passport data: data describing the identity and origin of the samples. On the basis of these data so called central crop databases were created, giving an overview of all PGR accessions of a crop and its wild relatives maintained in Europe or the entire world. To facilitate this exchange, the so-called Multi-Crop Passport Descriptor List (MCPDL) was compiled. This list was nothing more than a list with 28 commonly used descriptors including descriptors identifying the maintaining institute, the local ID, and the IDs given by the collecting expedition and/or the donating institute, but also the taxonomic classification, information about the location of collecting, etc. The use of some codes was required, including some defined codes defined by the MCPDL (for the biological status of the accession, for a classification of the collecting or acquisition source and for the type of storage), but more importantly two externally maintained coding systems: the 3-letter ISO 3166 country codes and the Institution Codes as maintained by the FAO. The first system worked relatively well, although the fact that it did not maintain historical codes, such as the one for the Soviet Union, caused problems, since these appeared and still appear frequently in the datasets. The second system, the one for institution codes, proved more problematic. It appeared incomplete and poorly maintained because of the bureaucratic way it was managed. However the MCPDL created solutions for both problems, by extending the standard ISO 3166 list and by allowing an alternative for institutes without institution code.

When the European genebank community started sharing passport information on a routine basis in the EURISCO database, the MCPDL was adopted for data exchange extended with a few new descriptors mainly for administrative purposes and to link the PGR accession to additional information. Now users have access to passport data of c. one million accessions maintained in Europe. Recently, initiatives to include characterization and evaluation (C&E) data in this database were taken. The issues regarding data quality in EURISCO and regarding the complexities due to lack of standardization of C&E data will be discussed.