Data Standards – The Future for TDWG?
What are "Data Standards" and why do we need them?
TDWG’s Technical Architecture Group produces a “Roadmap” document each year that reviews the current standards and makes recommendations for action. The Roadmap for 2007 identifies a credibility gap concerning TDWG standards:
Most of the standards are more than 10 years old and paper based. Some are available electronically but the status of the paper and electronic versions is not clear. For example, can paper versions of standards be scanned and freely distributed?
The Roadmap recommends that:
TDWG must address the issue of 'Data Standards'. Some of the more successful TDWG standards are not data exchange standards but controlled lists of abbreviations or resources such as Index Herbariorum, Authors of Plant Names, TL2 and BPH. These standards are all ratified as paper-based publications. They have all been superseded by on-line versions that have no status within TDWG. None of the on-line versions have interfaces that allow integration into other standards.
TDWG needs a mechanism whereby a dynamic list that is only available through a web service can be a standardised. This mechanism needs to specify technical data access and legal issues.
The Executive Committee should seek out resources to establish this mechanism within the next year as a matter of urgency.
This document proposes such a mechanism.
Standardised Data Stifles Science?
Most ecologists would appreciate a standard list of taxa because it would enable them to work more efficiently than relying on a continuously varying classification. Most taxonomists would think of such a list as an anathema. There are no taxonomic groups where all taxonomists are unanimous.
Any standardised list/data raises similar problems. The list can be metadata about the data or the data itself.
The situation is more complex if the data is dynamic. An “improvement” to the data from one perspective may be deleterious from another perspective. A frozen list of facts can be peer reviewed. A dynamic list cannot.
Form and Substance
If we assume that it is not possible to standardise any data, we do not preclude the standardisation of “data sources”, i.e., separate “service” from the “data”.
The benefits of this distinction becomes apparent when we examine the questions a data consumer might ask – especially if they are to build an application or project based on data available online; what TDWG hopes to enable.
| Category | Users Questions | |||||||
|---|---|---|---|---|---|---|---|---|
| Availability |
| |||||||
| Curation |
| |||||||
| Legal |
| |||||||
| Conformance | Does the service really meet the standards set out above? |
None of the questions are likely to involve an absolute measure of the quality of the data. The quality of the data might be suitable for some applications and not for others, or even change through time. It is more important that there are known mechanisms for curation of that data and known mechanisms to influence that curation.
Take a restaurant analogy. Local authorities can take steps to protect public health but they don't legislate for food to taste good. Taste is left to the customers of the restaurant. Some people may like the food at a restaurant and others may not. TDWG standards can tell you what ‘food’ is available but can’t tell you what is good.
Definite and Indefinite Pronouns
There are significant differences between TDWG passing a standard for “A Source of Abbreviations for Author Names of Plants” and “The Source of Abbreviations for Author Names of Plants”. The former should be routine and subject neutral while the later is contentious. There is a grey area however. If TDWG only passes a single source for a particular data type it implies that it is THE data source of choice for that kind of data.
Recommendations
-
TDWG should issue standards for data sources.
-
These standards should be of the indefinite pronoun sort by default. “A Source of Data about X” not “The Source of data about X”
-
These standards should follow a template based on the questions asked above. Whether the template itself needs to be a standard is debatable.
-
Data source standards should be reviewed on a yearly basis and withdrawn if the terms of service set out in the standard are not being met by the data source.
-
In exceptional circumstances, TDWG should be able to withdraw a standard before the yearly review.
-
The TDWG collaborative infrastructure must include a feedback mechanism for people to report breaches in terms of service set out in the standard. These can feed into the review process. This may be as simple as an email address.
Benefits to Data Suppliers
Data suppliers publish their data because they want it to be used. They usually want to continue managing and improving on that data. Having the data source certified by TDWG has a number of benefits.
-
Community recognition. It shows that the community recognises them as a valid supplier of data.
-
Justification for funding. They can demonstrate to potential funders that they are contributing to the scientific community. This helps to address the lack of peer reviewed papers that this kind of activity tends to produce.
-
Quality control. There is a mechanism to tell them if they are no longer meeting the requirements of their users.
-
Increased involvement in other projects. Large scale projects, such as GBIF, EoL and ALA will come to rely on standardised services in a way that they couldn't rely on non-standard services. This will lead to an obligation to channel funding towards curation of associated data so as to maintain the service in a way that is unlikely to happen when curation is seen as being done for its own sake – a further justification for funding.
Comments
| Kehan Harman from Herbarium, Royal Botanic Gardens | |
| Friday, 23-11-07 12:04 Some of these standards have also been superseded by paper based publications eg: It's also critical to have some kind of statement about the suitability / future of these new paper versions. |
|


