Data quality use cases
The Task Group will provide a report list of Use Cases where data is assessed for suitability for that particular purpose. The use case descriptions will include the data required, quality dimensions and thresholds used to assess the data/dataset. This should provide a reference set of information that can be used to inform assess data suitability for particular purposes.
- Other than data availability, ‘Data Quality’ is probably the most significant issue for users of biodiversity data and this is specially so for the research community.
- This Task Group is reviewing practical, real world uses relating to ‘data quality’ with a goal to provide best current practice.
- If a list of practical data quality use cases can be provided to users of biodiversity records, then greater use and more appropriate use could be made of biodiversity data. Data providers and particularly aggregators such as GBIF and its nodes would have increased credibility with the user communities and be able to provide more effective examples of assessments of fitness for use.
- The other Data Quality Task Groups will focus on an overview or framework (TG1) and tools, services and workflows (TG2).
Goals, outputs and outcomes
A set of use cases that are in use by agencies and user communities to select records and/or data sets for particular purposes (March 2016). Extent of the report will be based on agencies and user communities that have responded.
- The use cases will be documented in a structured format based on Toward A Conceptual Framework for the Assessment and Management of the Fitness for Use of Biodiversity Data (Veiga et al. in press).
- The use case template will be placed in a collaborative editing environment for completion and discussion.
- Contact, via the task group participants, other government and conservation agencies and user communities to establish and document use cases where they assess data fitness for use and the data, dimensions and thresholds required.
This Task Group would welcome anyone who has a practical interest in data quality and/or has experience with selecting data sets and records for specific purposes.
- Belbin, L., Daly, J., Hirsch, T., Hobern, D. and LaSalle, J. (2013). A specialist’s audit of aggregated occurrence records: An ‘aggregators’ response. ZooKeys 305: 67-76. https://doi.org/10.3897/zookeys.305.5438.
- Chapman, AD (2005a). Principles and Methods of Data Cleaning - Primary Species and Species Occurrence Data, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. 75p. http://www.gbif.org/resource/80528.
- Chapman, AD (2005b). Principles of Data Quality, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen. 61p. https://doi.org/10.15468/doc.jrgg-a190.
- Costello MJ, Michener WK, Gahegan M, Zhang Z-Q, Bourne P, Chavan V (2012). Quality assurance and intellectual property rights in advancing biodiversity data publications version 1.0, Copenhagen: Global Biodiversity Information Facility, 40 pp. ISBN 87-92020-49-6.
- Mesibov R (2013) A specialist’s audit of aggregated occurrence records. ZooKeys 293: 1-18. https://doi.org/10.3897/zookeys.293.5111
- Otegui J, Ariño AH, Encinas MA, Pando F (2013) Assessing the Primary Data Hosted by the Spanish Node of the Global Biodiversity Information Facility (GBIF). PLoS ONE 8(1): e55144. https://doi.org/10.1371/journal.pone.0055144