Framework on data quality

Most of the organizations today confront DQ problems using ad hoc approaches for fixing errors which cause DQ problems. However, such efforts do not assure that data will be fit for use for every purpose. This Task Group is working on the organization of concepts related both DQ needs and solutions for the Assessment and Management of fitness for use of biodiversity data. We expect that outcomes from the Task Group will allow the Biodiversity Informatics community to join efforts to tackle DQ issues by sharing and reusing DQ requirements, methods, tools, services, workflows and best practices which can be used for DQ measurement, validation, recommendation and error prevention and correction.

GitHub

Image by Slava Bowman

This task group has completed its work. Please see the GitHub repository (linked above) for the results.

Convenor

Allan Koch Veiga

Motivation

  • A consistent approach to assess and manage DQ is currently critical for biodiversity data users. However, to achieve this goal has been particularly difficult because of the idiosyncrasies inherent to the concept of quality. DQ assessment and management cannot be performed if we have not clearly established the quality needs according to a data users standpoint.
  • Our understanding about “DQ Assessment” is the deed performed by data users or curators to judge the extent of the fitness for use of a data (single record or dataset) for a specific purpose; and “DQ Management” is the deed performed by any actor (software, people, institution) to improve DQ in order to turn data fitter for use for a wider range of uses.
  • A conceptual framework should support the Biodiversity Informatics community to describe, from a data users perspective, the meaning of “fitness for use” in a common and standardized way.
  • A collaborative model can generate a searchable repository of common and reusable components such as DQ Profiles (definition of what “quality” means for a specific purpose), DQ policies, dimensions (measurable aspects of quality), criteria, enhancements, specifications (methods) and mechanism (tools, services, workflows) for a range of purposes of data usages, enabling institutions to compose their own DQ needs and solutions to better suit their goals concerning fitness for use.

Goals, outputs and outcomes

  • A formal Conceptual Framework for the Assessment and Management of the fitness for use of biodiversity data.
  • Establish a “common language” in order for the Biodiversity Informatics community to express and share their understanding of DQ needs and solutions, to increase the reusability and decrease the duplication of efforts.
  • A case study that describes how to use the Conceptual Framework for performing the Assessment and Management of fitness for use in an institution.
  • Methods and guidelines to use the Framework.
  • Establish a common vocabulary for the whole DQ Interest Group.

Strategy

  • Join, organize and formalize ideas and concepts concerning DQ in a Conceptual Framework.
  • Evaluate the proposed Framework with a case study.
  • Propose a method for using/applying the Framework for the Assessment and Management of fitness for use.
  • Support Biodiversity Informatics community with guidelines and training about the Framework.
  • Support and follow the application of the Framework for the Assessment and Management in some Biodiversity Informatics organizations.
  • Evaluate and enhance the Framework and its vocabulary by promoting discussions and forums with the DQ Interest Groups members.

Becoming involved

  • This Task Group would welcome anyone who has a practical and theoretical interest in data quality and/or has experience with ontology, data/information/knowledge management, data policy, data governance and with any stage of life cycle of biodiversity data (capturing, handling or using data).
  • Contact the Convener.

Resources