Modelling Research Expeditions

The Expeditions Task Group creates best practice recommendations for the modelling and linking of expedition data.

Image by NOAA

Charter

A Task Group of the Collections Description Interest Group

Convenors

  • Sabine von Mering, Museum für Naturkunde Berlin, DE
  • Siobhan Leachman, Wikimedia Aotearoa New Zealand, NZ
  • Dag Endresen, University of Oslo, NO

Core Members

  • Quentin Groom, Meise Botanic Garden, BE
  • Joaquim Santos, University of Coimbra, PT
  • Annika Hendriksen, Naturalis Biodiversity Center, NL
  • Elspeth Haston, Royal Botanic Garden Edinburgh, UK
  • Robert Cubey, Royal Botanic Garden Edinburgh, UK
  • Paul Braun, National Museum of Natural History, Luxembourg, LU

Summary

The Research Expeditions Task Group focuses on creating a model (schema) for a structured and linked open data (LOD) set for research expeditions within the Biodiversity Information Standards. The goal includes finding stable identifier solutions for research expeditions, developing guidelines for describing expeditions in Wikidata, and linking expeditions to various entities like museum objects, participants, and publications. We use Wikidata as it is a collaborative and multilingual platform. It facilitates linking research expeditions to and from collection management systems and data aggregators. By proposing a Wikidata schema, providing guiding examples, and recommending best practices, the task group aims to enhance the accessibility and understanding of data related to expeditions. Stakeholders involved in this initiative range from developers to domain experts, including collection managers and data managers. The strategy involves regular virtual working sessions, GitHub for coordination, collaborative events with the biodiversity informatics community, and reporting progress to the TDWG community. Ultimately, the project seeks to create a dynamic dataset of research expeditions that can answer research questions, allow historical contextualization, and provide insights into the provenance and impacts of collected objects. The task group contributes to the FAIR (Findable, Accessible, Interoperable, Reusable) principles in data management within the natural history and biodiversity research domain.

Motivation

Expeditions and other collecting events are a major source of objects in natural history museums (e.g., Mesibov 2021). Historically, these trips were often transdisciplinary: biological and Earth science specimens were collected at the same time as ethnological or anthropological objects. As a result, specimens, along with other materials collected during the same expedition, as well as the related data and metadata, are often distributed across multiple institutions. Many expeditions were driven by colonial agendas, aiming to discover new resources to exploit, and their findings were seldom shared with the source countries and local people. Understanding these expeditions illuminates the colonial origins of museum collections, and contributes to recognizing and addressing their impacts (e.g., Das and Lowe 2018, Ashby and Machin 2021).

Research expeditions continue to contribute to natural history collections. There is a need to establish clear links between historical or contemporary research expeditions to other entities such as collection specimens, necessitating unambiguous labelling and persistent identifiers for such events. The stable identifiers plus the sharing of metadata and descriptions in a wide range of languages will facilitate access to scattered information about these events, the institutions housing specimens and objects, the participants and the locations visited, and assist with linking distributed material and related research data. However, structured data for scientific expeditions are currently lacking. Despite creating identifier systems for many entities over the last few decades, there is no dedicated identifier for research expeditions and similar events. Several studies have shown the importance of people identifiers for linking collection data (e.g., Groom et al. 2022), and we argue the same is true for expeditions. Wikidata QIDs can here act as an identifier for expeditions in a similar manner as Wikidata QIDs can act as an identifier for collectors without an ORCID.

Goals, outputs, and outcomes

  • Find a suitable stable identifier solution for research expeditions.
  • Explore if Wikidata is an appropriate metadata authoring system for describing research expeditions.
  • Develop guidelines and best practices for describing research expeditions in Wikidata.
  • Identify existing terms for re-use and develop new Wikidata metadata properties and tools for describing research expeditions.
  • Develop metadata solutions for linking research expeditions to museum objects, to the people who collected these objects, and to publications about or resulting from research expeditions.
  • Investigate the use and linking of other digital platforms, such as Biodiversity Heritage Library, Wikimedia Commons, and Wikisource, for the storage and dissemination of multimedia items arising from and about expeditions.

Output:

  • Creating/proposing a Wikidata schema for scientific expeditions reflecting the “guidelines” established.
  • Develop guiding examples for different types of expeditions described in Wikidata, including links to entities such as participants, collections, publications, and itineraries.
  • Create a best practice document giving recommendations on creating and developing research expedition Wikidata items.

Outcomes:

A community-driven project creating a dynamic dataset of research expeditions as structured and linked open data that

  1. collect and link distributed information on expeditions thus creating an expanded and dynamically growing dataset in Wikidata,
  2. answer research questions related to expeditions such as the provenance, circulation, and whereabouts of gathered collection objects,
  3. allow critical reflection and historical contextualization, especially in colonial contexts.

Strategy

  • Through a series of regular virtual working sessions, the task group will review, discuss and enrich the existing Wikidata items and create an updated Wikidata schema linking the event to other entities such as people, places, and publications.
  • GitHub will be used to co-ordinate the work and documentation. A task-driven work plan will be developed there which will be openly available. This platform will support the focused working sessions and will help identify and engage user participation from across the bio- and geodiversity community and other interested parties (e.g. from humanities, ethnological collections), from all parts of the world.
  • Exchange on best practices and engagement with the collections and the biodiversity informatics community as well as the Wikidata community via collaborative events such as workshops, edit-a-thons and hackathons.
  • Agree on or exchange workflows and the subsequent management (including making public) of the outputs of digitization, including reporting requirements for management and funding agencies.
  • Reporting progress and updates to the TDWG community, e.g. via presentations at TDWG conferences, the task group was initially proposed at TDWG 2023.
  • Update the WikiProject page with any documentation outlining the recommended properties, best practice, Wikidata queries etc. and to inform the Wikidata community.
  • Use cases from across the community will be documented and summarized. The schema will then be updated and tested against these use cases as part of the evaluation process, and adjusted as needed.
  • Prepare research articles on research expeditions including articles making use of linked open data, identifiers and metadata available on research expeditions.

Stakeholders

A range of key stakeholders can be identified as beneficiaries of the Wikidata schema. The task group will engage their participation, especially for implementing the schema.

Stakeholders include:

  • Developers of collection management systems,
  • Collection managers,
  • Data managers,
  • Wikidata community (general public, citizen scientists),
  • Domain experts/researchers from all relevant disciplines,
  • Global Biodiversity Information Facility (GBIF; biodiversity data aggregators) incl. GRSciColl,
  • GeoCASe,
  • Biodiversity Heritage Library (BHL, library and archives resources),
  • Bionomia

Becoming involved

  • This Task Group would welcome anyone interested in research expeditions, historic as well as contemporary. People who are experienced in modelling information in Wikidata, managing and linking entities related to expeditions (including research output such as publications, specimens and other objects, archival material).
  • We welcome group members from other communities or collections that are not (or not entirely) focussing on natural history collections including members with expertise in critical reflection or colonial contexts etc.
  • Contact the Conveners via email (or the WikiProject page).

Context/History

Wikidata is a multilingual community-curated knowledge base containing data structured in a human- and machine-readable format. It allows easy creation, updating and enriching of items on expeditions, and provides stable identifiers for them that can be used in collection management systems. Expeditions can be linked to participants and other agents, regions, localities, objects, archival material, maps, publications, field notebooks, documentary footage and artworks resulting from the expeditions, thus making historical information more easily accessible and assisting with the acknowledgment of any imperial or colonial impact that may have resulted from the expedition. Expeditions in Wikidata can be hierarchical, e.g., linking a series of related events or under an umbrella project together providing a machine-readable way to harvest all project data. Wikidata also can provide links between present-day countries and historical names for locations (e.g., former colonial names). Expeditions published as Linked Open Data make datasets more FAIR (Findable, Accessible, Interoperable, Reusable), and are also useful in data transcription and validation processes. Visualization of itinerary data and travel routes also facilitates data quality checks.

An informal working group of people interested in the topic was formed to discuss standards and share best practices and recommendations regarding terminology, data modelling and contextualisation. Building upon previous work (e.g., Bauer et al. 2022, von Mering et al. 2022, Leachman 2023), we aim to work towards the enrichment, linking and standardization of data about research expeditions. If the Wikidata identifiers of these expeditions and participants are added to the records of the corresponding entities in the collection management system, institutions can link from their own collection metadata to the relations made in Wikidata, including to collections in other institutions. The participants of the expedition can be further linked to specimens gathered during the expedition with the use of tools, such as Bionomia, which can facilitate data round-tripping between these collections and specimen records, the Global Biodiversity Information Facility (GBIF) and Wikidata (Shorthouse 2020). Other initiatives such as the Distributed System of Scientific Collections (DiSSCo) are also interested in incorporating these identifiers as links and annotations.

Considering the global nature of the aims outlined, it is appropriate now to propose a TDWG Task Group to prepare a draft global standard on the topic.

Relation to other TDWG interest/task groups

Collection Descriptions (Interest Group): The proposed task group is closely aligned with the aims of the Collections Descriptions interest group. This group is developing a data standard for describing entire collections of natural history materials. This includes distributed specimens collected on a research voyage or expedition plus the related publications about research results based on these specimens and other collection objects. This requires the ability to define a research voyage and refer to it using a resolvable, persistent identifier.

Attribution (Interest Group): The Modelling research expeditions task group will contribute to the aims of the Attribution interest group in giving attribution for the collection of specimens and people participating in research expeditions.

People in Biodiversity Data (Task Group): The Modelling research expeditions task group will contribute to the aims of the People in Biodiversity Data task group by linking to and enriching data on collectors and participants in research expeditions as well as people who are using specimens, researching and/or publishing on research expeditions.

Resources

WikiProject Research expeditions: https://www.wikidata.org/wiki/Wikidata:WikiProject_Research_expeditions (in progress)

Bauer J, Burkhalter R, Karim T, Krimmel E, Landis M, Leachman S, Little H, Lorente M, Mills SK, Neu-Yagle N, Norton B, Paul D, Shorthouse D, Utrup J, Van Veldhuizen J, and Walker L. (2022). Guidelines for Using Wikidata to Mobilize Information about People in Collections: A Paleontology Perspective. https://doi.org/10.5281/zenodo.6977243

Leachman, S (2023). Schema for Wikidata items for scientific collectors and research expeditions (Version 2). Zenodo. https://doi.org/10.5281/zenodo.8271285

von Mering S, Kaiser K, Petersen M (2022). Transforming Closed Silos into Shared Resources: Opening up databases on historical collection agents affiliated with the Museum für Naturkunde Berlin. Biodiversity Information Science and Standards 6: e93787. https://doi.org/10.3897/biss.6.93787

von Mering S, Braun PJ-C, Cubey RWN, Groom Q, Haston EM, Hendriksen A, Johaadien R, Leachman S, Marsden L, Rainer H, Santos J, Endresen D (2023). Modelling Research Expeditions in Wikidata: Best Practice for Standardisation and Contextualisation. Biodiversity Information Science and Standards 7: e111427. https://doi.org/10.3897/biss.7.111427

Written by Sabine von Mering, Dag Endresen, Siobhan Leachman, Elspeth Haston, Joaquim Santos, Paul Braun, Annika Hendriksen, Robert Cubey et al. January 2024

CC-BY 4.0