Proceedings of TDWG, 2009

The Global Biodiversity Information Facility (GBIF): The decentralised architecture

Samy Gaiji, Éamonn Ó Tuama, David Remsen, Vishwas Chavan, Tim Robertson

Abstract


In advancing the Global Information Facility (GBIF) “from prototype to full operation”, the community recognises the need to move to a more distributed and decentralised model based on the active engagement of more self-sufficient participants nodes.

Such logical evolution of the GBIF network architecture is aimed first at increasing its capacity to rapidly mobilise and share a larger amount of biodiversity related information covering not only the existing taxon point occurrence data records but also other data types such as spatial, multimedia, names, and associated metadata.

To achieve this, the GBIF Secretariat developed in 2008 a first blueprint of its decentralisation strategy, which was presented at the TDWG 2008 Annual Conference. To achieve this ambitious evolution of its network architecture, the focus has been on simplifying the process of contributing data from existing and new data publishers as well as improving the indexing frequency. Since then, GBIF has been actively engaged in delivering the first core components of its Informatics suite, namely: the Integrated Publishing Toolkit (IPT), the Harvesting and Indexing Toolkit (HIT) and the Global Biodiversity Resources Discovery System (GBRDS).

Towards decentralising its architecture, GBIF has made some radical and fundamental decisions in its approach to information networking. First, it has recognised that the existing information retrieval protocols based on federated search (e.g. TAPIR, DIGIR, BioCASe) were unsuitable for the scale of growth expected within a global network. More importantly, GBIF recognised that the actual Registry implementation based on UDDI was not scalable or rich enough to meet the growing needs of a network that requires discovering much more than single endpoints. Through the GBRDS, GBIF intends to bring forward the concept of a biodiversity network compass to be available for use by all informatics initiatives to discover, locate and index a large variety of biodiversity data resources, services, schemas etc… Finally, to enable successful interoperability and to facilitate accurate citation of data provenance, GBIF will provide solutions for the use of persistent and stable identifiers.

This presentation will provide an overview of the GBIF decentralised architecture and focus on how other informatics initiatives and networks (e.g. agro-biodiversity, invasive species etc.) could fully benefit from these achievements.