Symposia, workshops, & discussion sessions

SYM01 Applications of machine learning in biodiversity image analysis
SYM02 e-floras, e-faunas and species pages: Projects, methods, and tools
SYM03 Specimen data provision and mobilisation: DiSSCo community e-services and standards
SYM04 Where and how to find, store and use links between biodiversity data: the BiCIKL perspective
WKSH05 Translating TDWG Controlled Vocabularies
PD06 Challenges in curating interdisciplinary data in the biodiversity research community
SYM07 Digital Extended Specimens
SYM08 From the field, right on to your science table: Challenges of sharing ecological data
SYM09 APIs in Biodiversity Informatics–Innovation and Opportunities
WKSH10 Hands-on Activities to Learn OpenAPI and JSON:API for use in Collection Management Systems and Beyond
SYM11 Building collaborative resiliency through the Biodiversity Heritage Library
SYM12 Connecting biodiversity data with knowledge graphs
SYM13 Mushrooming, community science, and sharing biodiversity data
SYM14 Community building for our shared data future
SYM15 Maintaining the taxonomic backbone (or connecting those who try)
SYM16 Eat or be eaten: Don’t miss out on interaction data
SYM17 Assuring trust on community science biodiversity platforms: Policies and approaches
SYM18 (Re)Discovering known biodiversity: Digital accessible knowledge
UNCF19 The API Unconference

Last updated 27 September 2021

SYM01 Applications of machine learning in biodiversity image analysis

Session type: Symposium (unsolicited presentations considered)
Organizers: Quentin Groom, Meise Botanic Garden, Meise, Belgium; Elizabeth Ellwood, iDigBio, University of Florida, Los Angeles, CA USA

Openly licenced images from natural history collection specimens number in the tens of millions, as do images of biodiversity from research, community science and camera traps. This is only likely to increase, as is the rate of growth. In parallel, we have seen numerous experiments with automated image analysis, which have proven the feasibility and usefulness of machine learning with images of biodiversity. These techniques include species identification, recognising species interactions, image segmentation, object detection, trait detection, and trait measurement. Currently, the outlook looks bright for future research on machine learning on these images and for new ecological research fueled by data from machine learning. Yet we are only now discovering what the limitations are and what infrastructure we need to underpin such research. This symposium will showcase some of the potential applications of machine learning as well as the technical and scientific techniques used with the aim of encouraging research in this field.

SYM02 e-floras, e-faunas and species pages: Projects, methods, and tools

Session type: Symposium (unsolicited presentations considered)
Organizers: Francisco Pando, GBIF Spain/Real Jardin Botánico-CSIC, Madrid, Spain; William Ulate, Missouri Botanical Garden, Saint Louis, MO, USA

The concept of separate web pages for each species on earth (Gewin, 2002) appears in many forms as “electronic species pages”: from Wikispecies and Encyclopedia of Life (EOL), to e-floras and e-faunas, to digital field guides. They have become the primary information source of taxonomic knowledge for the non-specialist.

Digital species catalogs display a wide variety of approaches intended for very diverse aims. Many of these projects—tackling hundreds if not thousands of species, and dealing with the intrinsic complexity of biological species, their delimitation and interactions—tend to be long term and complex. Emerging technologies and methods can be an opportunity for these endeavors, but also a challenge to uphold the principles of Findability, Accessibility, Interoperability, and Reusability (FAIR), e.g., semantic interoperability, unique identifiers, deep learning, and massive collaborative approaches. With this symposium we aim to showcase some of the most ambitious and innovative approaches to cataloguing and disseminating species-level information and to identifying common ground to shape standards and recommended practices.

This symposium is directly linked to the TDWG Species Information Interest Group, and has connections to other TDWG groups, including Taxonomic Concept Transfer Schema (TCS), Biological Interactions Data, and Biodiversity Data Quality.

References: Gewin, V. All living things, online. Nature 418, 362–363 (2002). https://doi.org/10.1038/418362a

SYM03 Specimen data provision and mobilisation: DiSSCo community e-services and standards

Session type: Symposium (no unsolicited presentations considered)
Organizers: Wouter Addink, DiSSCo/Naturalis, Leiden, Netherlands; Sharif Islam, DiSSCo/Naturalis, Leiden, Netherlands

The institutions in Europe with natural science collections have started to define and develop e-services and standards as part of the development of the DiSSCo research infrastructure that will support the global biodiversity and geodiversity community. Funding through DiSSCo linked projects has been working as a catalyst to develop new TDWG specifications such as CD and MIDS. This is based on earlier work to collect about 600 life science and earth science user stories, and a gap analysis of existing standards. The community is represented in DiSSCo (Distributed System for Scientific Collections) through CETAF (Consortium of European Taxonomic Facilities) and work is carried out through a series of European Commission funded projects.

The goal of the session is to give an update on the current status of development and to explain how new TDWG standards under construction such as CD (Collection Descriptions) and MIDS (Minimum Information about a Digital Specimen) are going to support novel e-services such as ELViS, the European Loans and Visits System, which is already in operation with calls to request transnational access (visits) and virtual access (digitisation on demand) to natural science collections in Europe.

The session aims to start with a general overview of DiSSCo e-services, demonstrations and pilot projects in development, followed by more detailed presentations discussing some of the e-services and TDWG standards development in relation to DiSSCo, with a focus on the impact that this will have on collections and the wider community globally.

SYM04 Where and how to find, store and use links between biodiversity data: the BiCIKL perspective

Session type: Symposium (unsolicited presentations considered)
Organizers: Lyubomir Penev, Pensoft Publishers, Sofia, Bulgaria; Tim Robertson, Global Biodiversity Information Facility, Copenhagen, Denmark

Biodiversity data is a web of connections, between specimens, sequences, names, taxa, images, species interactions, traits, treatments, people and more. These connections are often clearly documented in the literature and are even available in digital form for interpretation. However, many, if not most, of these connections are not explicitly made to connect data across the internet. Making these connections explicit increases the findability of data, increases the efficiency of research and enables novel research. This symposium will examine how to find the links between the elements of biodiversity data, how and where they can be stored and how new information can be extracted from them. It is supported by the BiCIKL project, which over the next three years will build the Biodiversity Community Integrated Knowledge Library to connect the whole spectrum of biodiversity data, enabling new, cross disciplinary research on biodiversity.

The symposium welcomes contributions illustrating different approaches, methods and standards of discovery, preservation, annotation, management and re-use of linked biodiversity data, for example data warehousing, linking between Fair Digital Objects (FDO), Linked Open Data (LOD) technologies, and others.

WKSH05 Translating TDWG Controlled Vocabularies

Session type: Workshop (no unsolicited presentations considered)
Organizers: Steve Baskauf, Vanderbilt University Heard Libraries, Nashville, TN, USA; Paula Zermoglio, VertNet, Bariloche, Rio Negro, Argentina

As an international organization, it is important for TDWG to make its standards widely available in as many languages as possible. Currently, TDWG has six ratified controlled vocabularies that are generally available only in English. As an outcome of these sessions, we will make term labels and definitions in those vocabularies available in the languages of translators who participate in the working sessions. The first hour of each session will introduce the concept of vocabularies, explain the distinction between term labels and controlled value strings, and describe how multilingual labels and definitions fit into the standards development process. The second hour of each session will be a time to actually translate the vocabularies. The two sessions are timed for the convenience of volunteer translators around the world. Individual translators or small groups of translators working in a single language will fill out Google Sheets with their translations. The resulting translations will be compiled along with attribution information for the translators and be made freely available in JSON and CSV formats.

PD06 Challenges in curating interdisciplinary data in the biodiversity research community

Session type: Panel Discussion (no unsolicited presentations considered)
Organizers: Kimberly Cook, Indiana University, Bloomington, IN, USA; Inna Kouper, Indiana University, Bloomington, IN, USA

As research incentives become increasingly focused on collaborative work that spans multiple disciplines, more work is needed to address the technological and social challenges of curating interdisciplinary data.The challenges to curating biodiversity data include reconciling morphological trait descriptions and managing expectations for spatial and temporal resolution of datasets. Further, the biodiversity community needs to build bridges with research groups that study socio-environmental systems. This panel discussion will provide the space where people with a variety of experience curating interdisciplinary biodiversity data can share their knowledge and expertise. Panelists will focus on the ways in which disparate datasets are managed, linked, and shared within the biodiversity community, as well as how we can leverage our data to collaborate with partners outside academia. Insights from this discussion will help find common ground within an inherently data diverse scientific community and contribute to best practices and recommendations in working with interdisciplinary data. This session is part of a broader effort to investigate interdisciplinary data activities across all disciplines, develop workflows to support it, and provide informed recommendations for interdisciplinary teams.

SYM07 Digital Extended Specimens

Session type: Keynote + Symposium (no unsolicited presentations considered)
Organizers: Alex Hardisty, Cardiff University, Cardiff, UK; Andrew Bentley, University of Kansas, Lawrence, KS USA

A plenary keynote talk will explore the kinds of science that can be enabled by Digital Extended Specimens as counterpart objects on the Internet coupled to physical specimens in collections; science that either can’t be done today or that is very difficult to do now.

A complementary symposium session will explore Digital Extended Specimens (DES) and the outcomes of the 2021 global community consultation coordinated by the alliance for biodiversity knowledge and the exciting possibilities of digital representations of the billions of specimens currently held in the world’s natural science collections for research. Mechanisms for how our global community can collaborate will be explored together with details and explanations of technicalities, the role of openDS and other standards; approaches to implementation; and implications for institutions.

The aim will be to demonstrate how people, processes and tools can unite and align around DES to work better together to build a fully integrated next-generation digital data infrastructure for biodiversity and other natural science data in which DES act as anchoring points for connecting the world of natural sciences specimens and other data, thus making reliable knowledge and evidence about the natural world available to all.

SYM08 From the field, right on to your science table: Challenges of sharing ecological data

Session type: Symposium (no unsolicited presentations considered)
Organizers: Yanina V. Sica, Map of Life/Yale University, New Haven, CT, USA; Paula Zermoglio, VertNet, Bariloche, Río Negro, Argentina

Access to high-quality ecological data, such as those generated from monitoring efforts, is pivotal to assessing and modelling biodiversity and its change through space and time. In the face of the unprecedented biodiversity loss, inventory data are particularly relevant to meeting global monitoring and conservation goals. While the development and adoption of standards such as Darwin Core have ignited the mobilization and integration of incidental and opportunistic records, current standards are insufficient to capture the complexity and hierarchical structure of inventory data. This, together with a lack of sufficient incentives, has resulted in limited mobilization, integration and re-use of this type of data. Current efforts are working towards developing standards to share inventory data, i.e., Humboldt extension to Darwin Core. To maximize the efficacy of this and other initiatives in fully capturing the breadth of information available in inventory data, and facilitating their broader applicability and use, it is critical that different parties converse, identify common needs and possibilities regarding data sharing. This symposium will bring together insights from a broad range of members within the community. It will constitute an opportunity to engage in the discussion to direct future efforts on standards development for sharing ecological data.

SYM09 APIs in Biodiversity Informatics–Innovation and Opportunities

Session type: Symposium (no unsolicited presentations considered)
Organizers: Ben Norton, North Carolina Museum of Natural Sciences, Raleigh, NC, USA; James Beach, Specify Collections Consortium, Lawrence, KS, USA

Application Programming Interfaces (APIs) are the foundation for the modern web. They enable independent applications, systems, or databases to share information securely and seamlessly while maintaining functional independence. From that capability, web-based APIs enable the development of community-level digital workflows that can integrate biodiversity data processing across platforms to create a global ecosystem of value-added services easily accessible from within local data systems. Innovative efforts in Web API development are central to addressing the many challenges in biodiversity informatics. This symposium’s primary focus is to explore how and where APIs are being utilized to facilitate a globally integrated biodiversity data infrastructure.

Topics include:

The strengthening and advancement of existing data standards using APIs.
Innovative ways APIs can and will integrate and connect a global biodiversity data community.
New data quality assessment and control techniques using API-driven services.
Current capabilities, limitations, and opportunities provided by APIs to enrich existing data infrastructures, standards, and repositories.
Approaches and methods for documenting and standardizing APIs.
The role of APIs in collections management systems and how these fit within the global data infrastructure.

WKSH10 Hands-on Activities to Learn OpenAPI and JSON:API for use in Collection Management Systems and Beyond

Session type: Workshop (unsolicited presentations considered)
Organizers: David P. Shorthouse, Agriculture and Agri-Food Canada, Ottawa, Canada; Falko Glöckler, Museum für Naturkunde Berlin, Berlin, Germany

We aim to raise awareness of the OpenAPI, and JSON:API Specifications, for use in collection management and other services under the umbrella of the TDWG Biodiversity Services and Clients Interest Group and the DINA Consortium (DINA = “DIgital information system for NAtural history data”). We will give a short introduction on why and how these specifications are core pieces of the DINA architecture and will then guide participants through tasks to help them create, share, and use OpenAPI and JSON:API. No coding skills are required. For technically-oriented participants, we will identify packages and openly-licensed libraries of code that make use of these specifications in a number of programming languages. Our goal is to improve on how we model the communication layers within the software we create and between the projects we lead through the use of consistent, battle-tested service layers used by many other communities. This will force us to refine how we communicate our requirements and concepts, how we design resources that can interoperate, and will ultimately help us to democratize our data and technologies.

SYM11 Building collaborative resiliency through the Biodiversity Heritage Library

Session type: Symposium (unsolicited presentations considered)
Organizers: Constance A. Rinaldo, Biodiversity Heritage Library, Temple, NH, USA; Colleen Funkhouser, Biodiversity Heritage Library, Washington, DC, USA

The Biodiversity Heritage Library (BHL) is celebrating 15 years in 2021 and has been valued by the biodiversity community and others throughout this time. BHL, from its inception, had the goal of connecting data, uniting people through outreach, communication and engagement, and developing tools and Application Programming Interfaces (APIs) to improve access to biodiversity data. BHL’s key strength is the community. As the consortium has grown in members, content, and complexity, BHL has continued to exemplify stability and resiliency. The last year of lockdowns, isolation and working from home have demonstrated the importance of the BHL as a content-rich digital library, as well as a tool to connect biodiversity data. BHL has allowed work to continue where otherwise it would not have been possible without physical access to the library. BHL partners around the world continued to enhance metadata and improve content delivery. As noted by Alice Lemaire, the COVID crisis was “an accelerator of digital advances and transformation”. BHL’s natural ecosystem is collaborating virtually, supporting open science, and building a robust community. BHL’s collaborative resiliency provides value to the greater biodiversity community and beyond. This symposium will focus on the trajectory of BHL’s strategic plan and review new tools and advances.

SYM12 Connecting biodiversity data with knowledge graphs

Session type: Symposium (unsolicited presentations considered)
Organizers: Roderic D.M. Page, University of Glasgow, Glasgow, UK; Franck Michel, University Côte d’Azur, CNRS, Inria, Sophia Antipolis, France

Linking biodiversity data together is an implicit goal of many initiatives in biodiversity informatics. Taxonomic databases were early adopters of the Resource Description Framework (RDF), a building block for the Semantic Web. From 2006 onwards they were serving up millions of taxonomic names in RDF, each with a globally unique identifier such as Life Science Identifiers (LSIDs), hence the field was seemingly ripe for a “biodiversity knowledge graph”, a network of interconnected data that we could use to discover links between taxa, their names, the relevant publications, specimens, occurrences, traits, phenotypes, DNA sequences, and the people whose work enabled the creation of that knowledge.

However the biodiversity knowledge graph did not spontaneously assemble itself, suggesting that more work needs to be done before the knowledge graph becomes a key tool for integrating and exploring biodiversity data. This session will explore progress and prospects for constructing and exploiting biodiversity knowledge graphs, whether centered around broader efforts such as Wikidata or DBpedia, or more focussed on domain-specific initiatives such as Ozymandias and OpenBiodiv. In addition to these case studies, speakers in the session will explore the relationships between TDWG standards, data formats such as JSON-LD, markup vocabularies such as Bioschemas and the role nanopubs can play in disseminating and linking biodiversity knowledge. The session will also welcome presentations on the joint exploitation and cross-fertilization of biodiversity knowledge graphs and more “traditional” biodiversity data sources.

SYM13 Mushrooming, community science, and sharing biodiversity data

Session type: Symposium (unsolicited presentations considered)
Organizers: Rob Stevenson, University of Massachusetts Boston, Boston, MA USA; Bill Sheehan, Fungal Diversity Survey (FunDiS), Athens, GA, USA

Fungi play essential roles in ecosystems by recycling matter and forming critical partnerships with plants. However, estimates suggest that less than 5% of fungal species are described in the taxonomy literature. This symposium brings together amateurs and professional scientists from four continents to describe how they and their organizations are documenting fungal diversity. Mushrooms are a natural entry point for amateurs to study fungal diversity because mushrooms have long been harvested by humans for food and for their medicinal properties. Talks describe the information technologies including apps, databases, platforms, standards, including TDWG standards, and controlled vocabularies that have shaped project development and evolution. Specific attention is given to the linkages being used to connect field observations with specimen collections and DNA sequences to help overcome the taxonomic uncertainties of fungi. Speakers describe strategies to work with amateurs, attitudes of amateurs toward participation, data quality issues and the scientific products derived from their programs.

SYM14 Community building for our shared data future

Session type: Symposium + panel (unsolicited presentations considered)
Organizers: Holly Little, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA; Rebecca Snyder, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA

As we continue to expand the reach of our global informatics landscape across the biological, geological, and anthropological domains we increasingly reinforce the idea that one can’t go it alone when working with data in this realm. Collaboration, coordination, and cross-functional efforts are essential to the success of creating FAIR (Findable, Accessible, Interoperable and Reusable) and CARE (Collective Benefit, Authority to Control, Responsibility, and Ethics) data, and implementing the supporting research infrastructures for data standards and identifiers. This session invites speakers to share existing or proposed models for structuring communities that help to ensure successful and sustainable coordination of the people driving this work with our increasingly interconnected global data resource. We are considering communities broadly, at the local or global level, with a focus on the informatics capacity across our organizations and programs. How do we as a community improve the sharing of best practices, expertise, and capacity? What are the challenges, successes, and impediments that we can learn from while building and strengthening our communities of practice?

SYM15 Maintaining the taxonomic backbone (or connecting those who try)

Session type: Symposium + demos (unsolicited presentations considered)
Organizers: Vijay Barve, Purdue University, Indiana, USA; Nicolas J. Dowdy, Milwaukee Public Museum, Milwaukee, WI, USA; Teresa J. Mayfield-Meyer, Milwaukee Public Museum, Albuquerque, NM, USA

Taxonomy defines our shared concept of every organism, acting as the foundational backbone of the biological sciences. Curated taxonomic name lists have traditionally been published in print and more recently, published online, but they are often rendered obsolete quickly after they are published and only rarely maintained by those who published them. While creating taxonomic name resources, various challenges are encountered as well as some great solutions. In this session we propose to bring together members of TDWG to share tricks of the trade, learn from each other, and connect important resources in the taxonomic name game. We anticipate the session to be a combination of symposium and demonstration. We invite abstracts from people or groups working on curated taxonomic name lists to present challenges, solutions, and workflows developed in the creation and maintenance of curated taxonomic name lists.

SYM16 Eat or be eaten: Don’t miss out on interaction data

Session type: Symposium (unsolicited presentations considered)
Organizers: José Augusto Salim, University of São Paulo, São Paulo, Brazil; Maarten Trekels, Meise Botanic Garden, Meise, Flanders, Belgium

Scientists use a variety of methods to collect, record and store biological interaction data (e.g. predator-prey, host-parasite, pollinator-plant) and the uses for that data are equally diverse. At the same time, numerous efforts are underway to efficiently aggregate, organize and disseminate biotic interaction data. However, we do not have a formal standard to support biotic interaction data sharing and interoperability. The aim of this symposium is to provide an opportunity for those involved or interested in digitizing biological interaction data to share their experiences and ideas so that we can move forward and propose a common (standardized) model for sharing biotic interaction data. The Biological Interactions Data Interest Group (BID-IG) is a formal group within TDWG where this topic is being discussed within the biodiversity informatics community. During the symposium, we will provide an update on the group’s achievements, as well as welcome other interested parties to present their work, specifically those related to the different formats, models and repositories that have been used to share and integrate biotic interaction data.

SYM17 Assuring trust on community science biodiversity platforms: Policies and approaches

Session type: Symposium (no unsolicited presentations considered)
Organizers: Elizabeth Ellwood, iDigBio, Los Angeles, CA, USA; Rob Stevenson, University of Massachusetts Boston, Boston, MA, USA

Data collected by citizen scientists make up a strong and growing part of biodiversity occurrence data yet there is still much discussion around the best approaches to produce, identify and make available high quality, trustworthy data. How do citizen science managers, projects, and platforms streamline workflows such that the data produced are trusted and ready for research? What metrics are most helpful for evaluating trustworthiness in citizen science data? Is it possible to reach a point where trustworthy data are apparent and speak for themselves? In this symposium, hosted by the TDWG Citizen Science Interest Group, speakers will discuss trustworthy data, the ways in which projects and platforms may programmatically classify, identify and account for data quality, and the role of standards and protocols in robust citizen science projects.

SYM18 (Re)Discovering known biodiversity: Digital accessible knowledge

Session type: Symposium (no unsolicited presentations considered)
Organizers: Donat Agosti, Plazi, Bern, Switzerland; Alexandros Ioannidis-Pantopikos, Zenodo - CERN, Meyrin, Switzerland

Scientific publishing is building up our knowledge by connecting facts using an intricate network of implicit and explicit citations. Taxonomic publications are exceptionally rich beginning with the explicit citation of publications, to implicit citations provided by taxonomic names, treatment citations, materials, actors, collections to a domain-specific vocabulary. In a sense, we are still living in an analogue world because most of our knowledge is not even digital or most of it is made for human consumption at best and not as digital accessible knowledge (DAK). DAK are facts that are both human and machine readable and they are open, findable, accessible, interoperable and reusable, proven as such by their reuse by external services such as the Global Biodiversity Information Facility (GBIF). A second aspect of DAK is that its citations are annotated with respective identifiers. This allows easy integration and/or connection with existing DAK.

Discovering known biodiversity is a challenge taken on by Plazi. This requires defined target resolution of data, sustainable infrastructures, workflows, reference vocabularies, resources, and strategies to discover and convert a daunting corpus of ca 500 M pages of rapidly ever-growing publications. Current technical and persistent identifier developments will be highlighted.

UNCF19 The API Unconference

Session type: Unconference
Organizers: Deborah Paul, Illinois Natural History Survey, Species File Group, Champaign, IL, USA; Quentin Groom, Meise Botanic Garden, Meise, Belgium; Nicky Nicolson, Kew, Richmond, London, England; Matthew Yoder, Illinois Natural History Survey, Species File Group, Champaign, IL, USA; David Shorthouse, Ag-Canada, Ottawa, Canada; Cat Chapman, iDigBio, University of Florida, Gainesville, FL, USA; Mike Trizna, Office of Chief Information Officer, Smithsonian, Washington, D.C., USA

This half-day workshop/hackathon will explore how connected our global community is by proxy of how our collective Application Programming Interfaces (APIs) can be theoretically or practically linked together. Our focus on existing data-serving APIs has several benefits. First, it focuses on actually built software, rather than proof of concept pilot projects; this exposes and illuminates the differences between standards and implementations so that improvements can be made on both sides. Second it explores a long-term future scenario, where every data provider exposes their own API and scientists, and users crawl across “the source”, rather than inferred or interpreted versions of the source. That is, if we are unable to link the existing ecosystem of APIs at present, where can we make improvements so that this vision is met?

First, together we will quickly enumerate the existing APIs we each know about. Our goal is not to be comprehensive, but rather to collectively aggregate known application resources (this itself is a nice proxy test of accessibility). From there we will proceed to the best of our ability, in the time provided, by linking the APIs, and documenting these linkages, in three ways: 1) by quickly observing that some combination of responses from 2 or more APIs could be used to do X, i.e. “theoretically linked”; 2) by identifying sets of specific (real) API calls across two or more providers that represent a specific use case, i.e. “demonstrably linked” and 3) time permitting, by coding tiny proofs of concepts that demonstrate integration across two or more APIs, i.e. “actually linked”. Each of these three ways can be run in parallel based on participants’ interest. We plan to offer a small prize in each category.

On this page