1. Abstract
  2. Symbols and terminology conventions
  3. Introduction
  4. General Aspects
  5. TAPIR XML Documents
  6. Operations
  7. Global Parameters
  8. Counting and Paging
  9. Filters and Expressions
  10. KVP (Key-Value Pair) Requests
  11. The TAPIR XML Schema
  12. Appendix

TDWG Access Protocol for Information Retrieval (TAPIR) Specification

This version: TAPIRSpecification_2007-02-24.html

Latest version: www.tdwg.org/activities/tapir/specification

Previous version: www.tdwg.org/dav/subgroups/tapir/1.0/docs/TAPIRSpecification_2007-02-07.html

Date: 24 February 2007

Technical Writers

Reviewers

Main Authors of TAPIR

Main Collaborators

Protocol Revision History

Status of this document

Working draft under revision. This document may be updated, replaced or obsoleted by other documents at any time.

This document specifies the TAPIR protocol and the structure and syntax of TAPIR messages. It is a product of the TDWG TAPIR task group, details of which can be found at www.tdwg.gbif.org/subgroups/tapir/. Publication as a Working Draft does not imply endorsement by TDWG as a standard.

The English language version of this document is the only normative version being prepared.

Comments about this document can be sent to the TAPIR mailing list: tdwg-tapir@lists.tdwg.org. Subscription and archives are open to the public.

About the examples

This specification makes use of numerous examples that illustrate the various aspects of the TAPIR protocol and its implementation. Most examples refer to fictitious servers and data (unless otherwise declared) and may not be complete (well formed and valid) since they usually focus on specific parts of TAPIR messages, omitting others for the sake of clarity.


Copyright Notice

Permission to use, copy and distribute this document in any medium for any purpose and without fee or royalty is hereby granted, provided that you include attribution to the Taxonomic Database Working Group and the authors listed above. We request that the authorship attribution be included in any software, documents or other items or products that you create related to the implementation of the contents of this document.

This document is provided "as is" and the copyright holders make no representations or warranties, express or implied, including, but not limited to, warranties of merchantability, fitness for any particular purpose, non-infringement, or title; that the contents of the document are suitable for any purpose; nor that the implementation of such contents will not infringe any third party patents, copyrights, trademarks, or any other rights.

The TDWG and authors will not be liable for any direct, indirect, special or consequential damages arising out of any use of the document or the performance or implementation of the contents thereof.


1. Abstract

This specification defines the TDWG Access Protocol for Information Retrieval. TAPIR specifies a standardised, stateless, HTTP transmittable, request and response protocol for accessing structured data that may be stored on any number of distributed databases of varied physical and logical structure. TAPIR requests can be formulated as either XML documents or as Key-Value Pairs in HTTP calls. TAPIR responses are always encoded in XML.

Individual service requests and responses are called operations. TAPIR operations include ping, metadata, capabilities, inventory and search. TAPIR does not include operations for adding, updating or deleting data or making any other changes to provider databases. Its functionality is restricted to query operations.

TAPIR is not bound to any particular conceptual schema or output model for its main operation (search). Request and response documents reference external concept definitions and external output models that can be defined by networks or by data providers themselves. The structure of data provider databases remains opaque to TAPIR users, as they must be mapped to one or more conceptual schemas that are declared in TAPIR capabilities responses. TAPIR queries can thefore be formulated based on these conceptual schemas.

TAPIR's flexibility makes it suitable to both very simple systems where the provider only responds to fixed queries, using provider declared query templates and output models, or in more complex systems where the provider software can dynamically parse any user supplied query template and output model, providing that they validate against the system's declared XML capabilities and concept mappings.

Although TAPIR requests and responses can be validated using an XML Schema (tapir.xsd) the full TAPIR protocol contains additional rules that need to be followed and are specified on this document.

2. Symbols and terminology conventions

The following conventions are used throughout this document to aid clarity:

<.....> Enclosing angle brackets indicate enclosed terms refer to an XML element name.
@ @ sign before a term indicates that it is an attribute name (but @ is not part of the name).
normative verbs The terms "must", "must not", "should" and "may" are applied in the normative sense in relation to the specification.

Following are symbols used in regular expressions for defining elements and terms:

::= Used to indicate the content equivalent of an element or term.
? When used after a an element or term in a term definition, indicates that it is optional. When used in a URL it indicates a parameter list for a GET statement.
+ Indicates an element or term must be represented one or more times.
* Indicates and element or term can occur zero or more times.
| A vertical bar between terms or elements indicates alternatives, as in a choice from a list.

3. Introduction

3.1. Background

The rapid expansion of the Internet and use of the World Wide Web has been accompanied by the desire to create new information services capable of drawing data from large numbers of data providers, regardless of their data management systems or geographic location. There are many practical problems involved in marshalling data from many sources, the great variation in data stores, data structure and data quality being particularly challenging.

XML has become a widely used method to define structured content and share information over the Internet. An increasing number of protocols and data standards are being defined using XML, and even Semantic Web technologies make extensive use of XML to encode data.

Formal description of how specific classes of XML documents should be structured can be done using XML Schema. XML schemas are effective for describing domain-specific data, usually referred to as content schemas, like DarwinCore and ABCD. XML schemas are also useful for the development of accompanying protocols for transmitting requests and responses between services. There have been two major initiatives in the development of biodiversity information networks based on these technologies. In United States, The Species Analyst Network developed the DarwinCore schema and the DiGIR protocol. In Europe, the TDWG ABCD schema and BioCASe protocol have been used to establish the BioCASE network. The DiGIR and BioCASe protocols have many similarities but are not interchangeable and this is clearly not desirable for global interoperability.

The importance of developing a unified protocol that could handle different content schemas was discussed at the GBIF Data Access and Database Interoperability sub-committee meeting held in Oaxaca in 2004. This discussion lead to GBIF commissioning a study that was published by Döring and De Giovanni in September 2004. This study included a proposal for an integrated protocol. The proposal evolved from a number of preliminary proposals that had considered alternative technologies including SOAP/WSDL, WFS and XQuery. These alternatives and the reasons why they were not adopted at that time are discussed in the GBIF study. A reference implementation based on the initial proposal was delivered in August 2005 as a new version (v2) of the PyWrapper software.

Work continued on the protocol, with a further meeting held in Madrid in November 2005, where major refinements of the protocol were agreed. A "feature freeze" in January 2006 enabled documentation to be initiated.

3.2. Scope

TAPIR is an acronym for TDWG Access Protocol for Information Retrieval. TDWG is an acronym for the Taxonomic Databases Working Group.

TAPIR is a Web Service protocol to perform queries across distributed and heterogeneous data sources. TAPIR is intended for communication between applications, using HTTP to transmit XML messages, but also accepting simple Key-Value Pairs in URL encoded requests.

TAPIR uses the XML Schema Definition (XSD) language to describe and validate the structure of request and response messages sent between a client (the requesting software) and a server (the provider of the data or service). TAPIR provides the means to query data suppliers based on conceptual schemas, query templates and output models that are usually defined by one or more federated networks. TAPIR was designed to be independent of any particular conceptual schema, query template or output model. The protocol also does not include functions for adding, deleting or changing data on databases.

When first developed, TAPIR was envisaged as a tool for unifying existing biodiversity data sharing networks based on the DiGIR and BioCASe protocols. During development, TAPIR has been refined into a generic product that has the potential to enable interoperability with domains other than biological observations and specimen collections, including geological, ecological, climate, gene sequence, geospatial data and others. The development of a generic tool has lead to increasing convergence with similar tools and standards used in a wider context, for instance, the OGC Web Feature Service (WFS) standard (where biodiversity data can be regarded as a geospatial feature) although significant differences remain.

3.3. Design Goals

The guiding objectives of TAPIR are to create a protocol for biodiversity information sharing that is:

3.4. Main Purpose

TAPIR's purpose is to provide a standard way to query distributed data providers, as well as retrieve service metadata, technical capabilities and monitor service availability. Networks are free to define query templates, XML output models and custom conceptual schemas that can be used to formulate queries. TAPIR therefore provides a means to abstract data heterogeneity through community-defined conceptual schemas, and also to get different XML output from the same data source.

3.4.1. Automated System Interaction

TAPIR requests may be purely software-based, covering communication between systems, establishing if a system is on-line, determining its capabilities, and seeking metadata about data holdings that may be used to guide future data requests. Inventory and search requests can also be used as a means of automatically updating indexes or building centralised data caches.

3.4.2. Interaction with End-users

The protocol can also convey user-entered data requests to remote systems via a query marshalling program that formulates queries from parameters entered or selected by users. The request may be sent directly to one or more TAPIR access points (URLs providing access through the protocol to on-line databases) or may be routed to access points through a message broker on a different system. TAPIR providers usually have their own "wrapper" software, which interprets the search request, runs a local query and formats the data according to an output model for transmission back to the original query program. The returned data may be merged from several sources and presented back to the original user in a variety of formats including a tabulated form, a map, or as a downloadable file.

This specification covers only direct communication with TAPIR access points. It is not intended to validate communication with message brokers that typically aggregate responses from several sources.

4. General Aspects

4.1. Available Operations

The TAPIR protocol defines five operations:

Each operation consists of a request and the corresponding response. TAPIR is a stateless protocol, in that operation calls are always completely independent from each other.

4.2. Message Transport

TAPIR messages (requests or responses) are intended to be transmitted by means of the Hypertext Transfer Protocol (HTTP). TAPIR supports both GET and POST methods for requests.

4.3. Access Points

TAPIR access points are represented by an HTTP Uniform Resource Locator (URL) that identifies the service. They actually correspond to the prefix of an URL, which includes the protocol (http or https), the host name, an optional port number and an optional path with or without a script name. To interact with a TAPIR service using HTTP GET, it may be necessary to append additional parameters to the URL, in which case the service access point (the URL prefix) should always remain the same for all operations.

The URL used to interact with a TAPIR service must be valid according to the HTTP Common Gateway Interface (CGI) standard and the query part of the URL must be encoded to protect special characters according to UTF-8 RFC 3986.

Access points can be used by UDDI in service discovery.

4.4. Message Encoding

4.4.1. Requests

TAPIR requests can be encoded in two ways:

4.4.1.1. Key-Value Pairs (KVP) Parameters

Requests are possible through HTTP GET or POST with the specific KVP parameters for each operation. Support for KVP request encoding is mandatory for all TAPIR service implementations.

An example of a KVP TAPIR capabilities requests is:

http://example.net/tapir.cgi?op=capabilities

4.4.1.2. XML

TAPIR requests can be encoded entirely as an XML document. The XML request encoding is optional for TAPIR service implementations. Messages can be sent to a service in two ways:

XML requests have the following general format:

<?xml version="1.0" encoding="utf-8" ?>
<request xmlns="http://rs.tdwg.org/tapir/1.0">
  <header>
    <!-- header specific elements -->
  </header>
  <operation_name>
    <!-- operation specific parameters -->
  </operation_name>
</request>

A capabilities request would take the following form:

<?xml version="1.0" encoding="utf-8" ?>
<request xmlns="http://rs.tdwg.org/tapir/1.0">
  <header>
    <!-- header specific elements -->
  </header>
  <capabilities/>
</request>

4.4.2. Responses

TAPIR responses should be always encoded in valid XML and have the HTTP Content-Type set to "text/xml". They normally include a header section, the specific result from the requested operation and an optional diagnostics section. In search responses, this may not occur if the "envelope" parameter is set to "false". In this specific case, the response structure will be completely determined by the output model definition.

<?xml version="1.0" encoding="utf-8" ?>
<response xmlns="http://rs.tdwg.org/tapir/1.0">
  <header>
    <!-- header specific elements -->
  </header>
  <operation_name>
    <!-- operation specific results -->
  </operation_name>
  <diagnostics>
    <!-- diagnostics information -->
  </diagnostics>
</response>

Example of the general message format of a TAPIR response when the envelope is not turned off

4.5. Namespaces

The namespace for this version of TAPIR is http://rs.tdwg.org/tapir/1.0

The TAPIR namespace must be the default namespace in all TAPIR messages.

TAPIR metadata responses include elements from the following namespaces:

Prefix Namespace Source
xml http://www.w3.org/XML/1998/namespace XML
xsd http://www.w3.org/2001/XMLSchema XML Schema
dc http://purl.org/dc/elements/1.1/ Dublin Core
dct http://purl.org/dc/terms/ Dublin Core Terms
geo http://www.w3.org/2003/01/geo/wgs84_pos# Basic geo vocabulary
vcard http://www.w3.org/2001/vcard-rdf/3.0# VCARD

TAPIR metadata responses may change the prefixes for all namespaces above, except the "xml" prefix which is reserved according to Namespaces in XML 1.1.

4.6. Conceptual Binding

4.6.1. Conceptual Schemas

In TAPIR, conceptual schemas provide a formal definition of concepts that are used for querying and reporting the content of databases. Conceptual schemas usually focus on specific areas of knowledge, providing data models with various levels of detail. Languages that can be used to define conceptual schemas include XML Schema, RDF Schema, XMI and others. Examples of conceptual schemas used by the previous protocols that generated TAPIR are DarwinCore and ABCD, both defined with XML Schema.

Although the main TAPIR operations always need to reference concepts, the protocol has been created to be independent of any particular conceptual schema. TAPIR networks and data providers are free to create or choose from existing conceptual schemas. However, it is important to note that the interoperability level across different TAPIR providers will depend on the conceptual schemas that they use. TAPIR providers can only understand queries that reference known concepts, so a TAPIR client cannot send the same request to two TAPIR providers that have mapped different conceptual schemas.

TAPIR messages can also reference concepts from multiple conceptual schemas, which means that conceptual schemas can be modularised and extended if necessary.

4.6.2. Concepts

Concepts referenced by TAPIR (e.g., species name, observation date, locality name, registration number, etc.) are defined externally to TAPIR and do not have to come from a single conceptual schema. Although concepts can potentially represent classes, relationships or properties, this version of TAPIR limits its use to properties (content elements).

In TAPIR, concepts are only referenced by their identifiers, which are always treated as simple strings. These references take place in different parts of the protocol, such as filter expressions, output model mappings, capabilities responses and inventory operations. The TAPIR XML Schema defines a "qualifiedConceptReferenceType" that is used by most elements representing concepts, and which consists of a complex type with an attribute called "id". TAPIR makes no assumptions about how concept identifiers are defined and it does not enforce any particular pattern. The only exception occurs when a provider declares a Concept Name Server, in which case a particular pattern for concept identifiers must always be supported as an alternative (see next section). However, fully qualified concept identifiers are recommended to be:

By being globally unique they can be distinguished from any other possible concepts. By being permanently resolvable, a formal definition of the concept can be retrieved whenever necessary. By being free from reserved characters for the query term of URLs, they can be used directly as HTTP GET parameters in KVP request encoding.

When concepts come from an XML Schema conceptual model, the recommendation is to concatenate the namespace of the schema with the local xpath to the instance element that corresponds to the concept. For example:

The element

/DataSets/DataSet/Units/Unit/InstitutionID

which is part of ABCD version 2.06, becomes

http://www.tdwg.org/schemas/abcd/2.06/DataSets/DataSet/Units/Unit/InstitutionID

4.6.3. Concept Name Servers

TAPIR offers a means to abbreviate concept identifiers by using Concept Name Servers (CNS). TAPIR providers can optionally declare the usage of one or more Concept Name Servers as part of the capabilities response. A CNS is a simple service identified and accessed by an URL which returns a formatted list of conceptual schemas and their concepts, including aliases for them. A CNS can also be used as a source of conceptual schemas when configuring a TAPIR provider. The text document returned by a CNS must follow the format demonstrated in the following example:

# Sample CNS document

[concept_source]

label     = ABCD 2.06
namespace = http://www.tdwg.org/schemas/abcd/2.06
alias     = abcd2.06
location  = http://www.tdwg.org/schemas/abcd/2.06/abcd.xsd

[aliases]

InstitutionID = http://www.tdwg.org/schemas/abcd/2.06/DataSets/DataSet/Units/Unit/InstitutionID

Lines beginning with "#" are considered comments and should not be processed. The document can contain multiple occurrences of [concept_source], each one necessarily followed by an [aliases] section containing all of its concepts.

When a TAPIR provider declares a CNS, it must accept alternative concept identifiers in the following form:

Concept_Alias '@' Concept_Source_Alias

This would enable the previous concept to be represented as

InstitutionID@abcd2.06

Concept Name Servers must be used with caution, usually in closed world scenarios where networks mandate that all participating TAPIR providers declare them. Otherwise TAPIR providers will not understand any queries formulated with aliased concepts.

Before using multiple Concept Name Servers, one must also make sure that there are no alias clashes among them.

4.7. Output Models

Output models are central to the search operation. They define a generic XML response structure based on XML Schema and a mapping between nodes in the schema and concepts from one or more conceptual schemas. To a certain extent, the mapping section gives a "meaning" to XML nodes in the structure, and clearly shows that the same concepts can be structured in different ways (each different structure could be defined by a separate output model).

Output models also include an indexing element by pointing to a node in the structure that should be used as a reference for record counting and paging.

Output models can be defined and used in various ways. They may optionally be created by the provider or recognised by the provider and advertised as <knownOutputModels> in their capabilities response, or if the provider supports <anyOutputModels> then the client may create their own models either as external documents or as in-line definitions in an XML search request. All required concepts in each output model advertised as known by the provider must refer to concepts mapped by the provider and advertised as <mappedConcepts> in the capabilities response.

The different ways of using output models in TAPIR allow for providers with different levels of service capability. Some providers may have a fixed (hard-coded) way of producing XML results that corresponds to each of their known output models, which means they do not need to have the ability to dynamically parse output model definitions. On the other hand, providers that have the ability to parse output model definitions (<anyOutputModels> capability) may choose to parse known models in the same way as they do for arbitrary client-provided models.

Models can be shared globally by listing them in a CNS. The location of the TAPIR model document together with an optional alias is listed in a single section of the CNS:

# Sample CNS model section

[models]

abcd120                       = http://rs.tdwg.org/tapir/cs/abcd1.20/model/abcd120.xml
abcd206                       = http://rs.tdwg.org/tapir/cs/abcd2.06/model/abcd206.xml
abcd206_getCoordinatesOfTaxon = http://rs.tdwg.org/tapir/cs/abcd2.06/model/getCoordinatesOfTaxon.xml

4.8. Query Templates

Query templates extend the idea of output models. If output models define what type of content should be returned and how it should be structured, query templates can add pre-defined, parameterised filters and other constraints depending on the operation.

Query templates can be used by search and inventory operations. An inventory template specifies one or more concepts and an optional filter. A search template specifies an output model, an optional filter, an optional order by parameter (pointing to concepts in the output model), and an optional selection of nodes to be returned from the output model (through the "partial" parameter).

For search operations, the same output model can be referenced and used by many different search templates, which could be related to different parts of the output model, or include different filter criteria for different contexts, etc.

The same flexibility for creating and using output models is available for query templates. Data providers can create their own specific templates, or choose from existing templates and then declare them in capabilities responses. Providers can also have the ability to dynamically parse arbitrary templates defined by clients.

Templates can be shared globally in the same way as models. The location of the TAPIR query template document together with an optional alias is listed in a single section of the CNS:

# Sample CNS model section

[templates]

abcd206                       = http://rs.tdwg.org/tapir/cs/abcd2.06/template/abcd206.xml

5. TAPIR XML Documents

TAPIR XML documents can be messages exchanged in operations or external resources such as output models and query templates, which can be referenced by search requests. All TAPIR documents must be valid XML documents and have a root element that validates against the TAPIR XML Schema. The root element of a TAPIR document declares its purpose and includes the namespaces referenced inside the document. It is recommended that the root element includes the "xsi" namespace and the TAPIR schema location to facilitate validation of messages. The allowable root elements according to the TAPIR XML Schema are as follow:

Root element ::= ( <request> | <response> | <outputModel> | <inventoryTemplate> | <searchTemplate> )

<?xml version="1.0" encoding="UTF-8" ?> 
<request xmlns="http://rs.tdwg.org/tapir/1.0" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0 
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
   <!--   header and operation specific content -->
</request>

Example: Root element with namespace declarations in a TAPIR request document

5.1. Request and Response Documents

TAPIR XML request documents consist of a header followed by an element indicating the operation, which in turn may contain additional parameters.

Request type ::= <header> 
                 ( <metadata> | <capabilities> | <inventory> | <search> | <ping> )

TAPIR XML response documents usually consist of a header followed by elements indicating the operation and result of the operation. The operation element can be followed by an optional diagnostics element.

Response type ::= <header>
                  ( <metadata> | <capabilities> | <inventory> | <search> | <pong> | <error> | <logged> ) 
                  <diagnostics>?

This general structure of a TAPIR <response> element including a <header>, an operation related element and an optional <diagnostics> is called TAPIR envelope. The TAPIR envelope can be turned off in search operations by setting the "envelope" parameter to "false". In this case the root element of a response will be determined by the output model response structure definition, and the TAPIR namespace should not be included at all.

5.1.1. Header

The purpose of headers is to give information about the source and destination of the operation, as well as timestamp and software related to the source. Headers must be present in all XML requests and all XML responses, except in search responses when the parameter "envelope" is set to "false" in the corresponding request. A TAPIR header has three parts:

Header type ::= <source>+ <destination>? <custom>?

The <source> element gives information about where the message originated and is repeatable to enable tracing back through any intermediary steps when the message has passed through more than one server in a cascading operation. Each intermediary service must add its address as a new <source> item at the end of the list.

The <destination> element is used to indicate the final target for a TAPIR message. It can be used when there are intermediary layers between the client and the server. This element is intended to help communication between clients and message brokers. The destination element takes a simple string, which will usually be a URI but can be anything, including codes or identifiers specific to networks. It is optional and a TAPIR provider is free to ignore it.

A <custom> element serves as an extension slot for any additional information not defined in the schema. It can be used to put whatever extra information an implementer wishes to add.

Source elements correspond to each software agent that created or processed the message until it reached the current service that received the message. It has three parts:

Source type ::= @accesspoint @sendtime <software>?

In requests, the "accesspoint" attribute must contain the IP address of each "source", except the last one. The IP address of the last source should always be taken from the REMOTE_ADDR environment variable. In responses, the "accesspoint" attribute must contain the service URL.

The "sendtime" attribute must be used to record the time that the message was sent or processed in the associated service. Its content must be recorded in ISO 8601 datetime format.

The <software> element can be used to identify the software used to process the TAPIR message. It is defined as follows:

Software type ::= @name @version <dependencies>*

The attributes "name" and "version" are simple strings and should be used to indicate the name and version number of the software.

The <dependencies> element can be used to list any other software, libraries, framework or operating system related to the declared software. The dependencies element includes repeatable instances of <dependency> which references back the software type.

<request xmlns=http://rs.tdwg.org/tapir/1.0
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
               xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                                   http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
      <software name="TapirClient" version="3.0"/>
    </source>
  </header>
  <capabilities/>
</request>

Example of a TAPIR request message encoded in XML. Here the client is asking for service capabilities.

5.1.2. Operation-specific Elements

Operation elements can contain specific parameters (in requests) or specific results (in responses) related to the operation. They will be described in more details in the next sections.

5.1.3. Fatal Errors

Errors are normally listed in the diagnostics section of a TAPIR message. When there are extreme errors that cause the system not to be able to formulate a proper response, for instance if a database connection error occurs or the user requests a service that the provider does not supply, then the <error> element may be used inside the <response> body.

<response>
  <!-- omitting header -->
  <error code="DBS_CONNECTION_ERROR" 
         level="fatal" 
         time="2005-11-11T12:23:57.023+01:00">Could not connect to database</error>
</response>

Example of a fatal error message given in replacement for the normal operation element in a response

5.1.4. Log-only Operations

When requests have the "log-only" attribute set to true, TAPIR providers produce a <logged> element in responses instead of the expected operation element. Log-only operations can be used to report back to the original data providers when users access their data from data aggregators.

5.1.5. Diagnostics

The <diagnostics> element can be used by responses for declaring errors and statistics. It consists of an optional list of multiple <diagnostic> elements.

diagnostic type ::= @code? @level @time? message

The "code" attribute is an optional system code for the error or message. There are no standard codes defined in this specification. TAPIR providers are free to define and use their own codes.

The "level" attribute is mandatory and can assume the following values to indicate the severity of the diagnostic:

@level ::= ( "debug"|"info"|"warn"|"error"|"fatal" )

An optional ISO 8601 datetime value can be included in the "time" attribute.

Diagnostic messages are text strings defined by implementations.

<response>
  <!-- omitting header -->
  <!-- omitting operation -->
  <diagnostics>
    <diagnostic level="error" code="REQ_READ_MODEL_FAILED">The requested model could not be read.</diagnostic>
    <diagnostic level="warning" code="REQ_STRUCTURE_UNKNOWN_SCHEMA_TAG">
      Unknown xml schema element "test" encountered in line 3. Ignored.
    </diagnostic>
    <diagnostic level="info" code="RSP_ELEM_DROP">
      XML element "myel" dropped because it misses required attribute "myattr"
    </diagnostic>
    <diagnostic level="debug">Start reading the datasource configurations</diagnostic>
  </diagnostics>
</response>

Example of a diagnostics section in a response document

5.2. Output Models

TAPIR search requests and search templates can refer to an output model document which is accessed as an external resource by its URL. An output model document must be an XML document that validates against the "outputModelType" defined in the TAPIR XML Schema. Output model documents have no header or diagnostic sections and use <outputModel> as the root element including the namespace declarations.

Output model documents can include optional (but recommended) documentation elements for a name (<label>) and description (<documentation>) that are of value in managing multiple output models and informing users of their function.

Output models can be defined with three elements:

<structure>
Defines the search response structure definition using a subset of the XML Schema language.

<mapping>
Maps nodes from the structure definition (given under <structure>) to concepts, literals or environment variables known to the provider software.

<indexingElement>
Points to a node in the structure, usually unbounded, that should be used for counting and paging returned 'records'. The indexing element is referenced by the @path attribute using a simple XPath that points to a response structure node.

5.2.1. Response Structures

Response structures must always specify a "targetNamespace" and must contain at least one global element definition. The first global element definition should be used to instantiate the root element in the resulting XML.

When no "partial" parameter is specified, the resulting XML should be "greedy", in the sense that it must contain as much data as possible. When a "partial" parameter is specified, the resulting XML should not be "greedy" - it must contain only the partial nodes and all mandatory nodes related to them (above and below the XML structure).

If a provider does not understand a non-required concept or variable related to a particular node or if it does not have content for it, the node must not be included in the response if it is optional. If the node is mandatory, it must be included with empty content.

TAPIR providers are not forced to guarantee the entire validity of search responses according to the XML Schema defined in the response structure, except to the extent of its own declared XML Schema capabilities.

It is recommended for providers to raise warnings instead of errors when an unsupported XML Schema construct is found in the response structure.

5.2.2. Output Model Mapping

Each data node in the output model structure must be mapped to one or more concepts (<concept>), literals (<literal>) or system variables (<variable>). The entire mapping is made of individual <node> mappings, where the attribute @path takes a simple Xpath to identify the output structure node. Individual mappings are completed by specifying one or more sub-elements (<concept>|<literal>|<variable>) associated to each node.

Node identifiers follow a subset of the XPath language using relative location paths from the root element inside the schema definition (inside <structure>) to the desired node. All steps are separated by the "/" separator. The XPath expression used here is actually based on the corresponding nodes of an instance document, and not the schema definition itself. Attribute nodes need the prefix "@".

When multiple mappings are cited for the same node, the final result must be the concatenation of the respective values.

<node path="/FeatureCollection/featureMember/LocationGML/Point/coordinates">
  <concept id="http://example.net/schema/Longitude" required="true"/>
  <literal value=","/>
  <concept id="http://example.net/schema/Latitude" required="true"/>
</node>

Example: Concatenated concepts in an output model mapping

When the response structure makes use of different namespaces, they all need to be declared and associated to a prefix in the "outputModel" element. In this case, node paths must include the prefixes. The same applies to the "indexingElement" path.

<node path="/r:records/r:record/dc:modified">
  <concept id="http://example.net/schema/DateLastModified" required="true"/>
</node>

Example: Node mapping involving different namespaces. Prefixes "r" and "dc" must be declared in the "outputModel" element

Concepts and variables have an optional attribute "required" which defaults to "false". When a concept is required but is either not mapped by the provider or is mapped but evaluates to NULL, the provider must return an error. When a variable is required and is either not available from the provider or evaluates to NULL, the provider must return an error. If a concept or variable is optional and associated to a mandatory node, and is either not understood by the provider or evaluates to NULL, the node must be included with an empty content in the response. In this case it is recommended for providers to raise a warning in the diagnostics.

In a concatenation, optional concepts or variables should be replaced by an empty string if they evaluate to NULL or are not understood by the provider.

The <mapping> element also includes an optional Boolean @automapping attribute. When automapping is set to "true", nodes should be automatically mapped to their equivalent concept identifiers (by concatenating namespace and local path. This is done when the model’s structural schema is also seen as a conceptual schema, therefore avoiding "redundant" mappings. This kind of special model is also referred to as a canonical model.

<?xml version="1.0" encoding="UTF-8"?>
<outputModel xmlns="http://rs.tdwg.org/tapir/1.0"
             xmlns:xs="http://www.w3.org/2001/XMLSchema"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                                 http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd
                                 http://www.w3.org/2001/XMLSchema
                                 http://www.w3.org/2001/XMLSchema.xsd">
  <structure>
    <xs:schema targetNamespace="http://example.net/simple_specimen">
      <xs:element name="dataset">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="specimen" minOccurs="0" maxOccurs="unbounded">
              <xs:complexType>
                <xs:sequence>
                  <xs:element name="identification" minOccurs="0" maxOccurs="unbounded">
                    <xs:complexType>
                      <xs:sequence>
                        <xs:element name="name" type="xs:string"/>
                        <xs:element name="identifier" type="xs:string" minOccurs="0"/>
                      </xs:sequence>
                      <xs:attribute name="date" type="xs:string" use="optional"/>
                    </xs:complexType>
                  </xs:element>
                </xs:sequence>
                <xs:attribute name="catnum" type="xs:int" use="required"/>
              </xs:complexType>
            </xs:element>
          </xs:sequence>
        </xs:complexType>
      </xs:element>
    </xs:schema>
  </structure>
  <indexingElement path="/dataset/specimen"/>
  <mapping>
    <node path="/dataset/specimen/@catnum">
      <concept id="http://example.net/schema1/CatalogNumber" required="true"/>
    </node>
    <node path="/dataset/specimen/identification/name">
      <concept id="http://example.net/schema1/ScientificName" required="true"/>
    </node>
    <node path="/dataset/specimen/identification/identifier">
      <concept id="http://example.net/schema2/PersonName"/>
    </node>
    <node path="/dataset/specimen/identification/@date">
      <concept id="http://example.net/schema2/DateText"/>
    </node>
  </mapping>
</outputModel>

Example: An example of an output model showing the use of <structure>, <indexing> and <mapping> elements

5.3. Query Templates

5.3.1. Inventory Templates

TAPIR inventory requests can refer to an inventory template document which is accessed as an external resource by its URL. An inventory template must be an XML document that validates against the "inventoryTemplateType" defined in the TAPIR XML Schema. Inventory template documents have a root element <inventoryTemplate> that includes its namespace declarations. There are no header or diagnostic sections in template documents.

Inventory templates can include optional (but recommended) documentation elements for a name (<label>) and description (<documentation>) that are of value in managing multiple templates and informing users of their function. The body of an inventory template includes a list of concepts upon which the inventory should be built and an optional filter section which allows the use of client-supplied parameters.

<?xml version="1.0" encoding="UTF-8"?>
<inventoryTemplate xmlns="http://rs.tdwg.org/tapir/1.0"
                   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                   xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                                       http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <label>Specimen Names</label>
  <documentation>Search and list unique scientific names of specimen 
  identifications ordered alphabetically by their name. The parameter 
  "name" can be used as a filter condition.</documentation>
  <concepts>
    <concept id="http://example.net/schema/ScientificName"/>
  </concepts>
  <filter>
    <like>
      <concept id="http://example.net/schema/ScientificName"/>
      <parameter name="name"/>
    </like>
  </filter>
</inventoryTemplate>

Example: A simple inventory template which allows the client to supply a scientific name (with wild cards if required) as a filter

5.3.2. Search Templates

TAPIR search requests can refer to a search template document which is accessed as an external resource by its URL. A search template must be an XML document that is valid with respect to the "searchTemplateType" defined in the TAPIR XML Schema.

Search templates can include optional (but recommended) documentation elements for a name (<label>) and description (<documentation>) that are of value in managing multiple templates and informing users of their function. The body of a search template includes a choice of external or internal output model and elements for refining, filtering and ordering the search results, as follow:

<outputModel>
Reference to an external output model document or inline definition of an output model.

<partial>
Optional element allowing the selection of a subset of the entire response structure to be returned. This feature is valuable when working with large external structures when only a small part of it is required in the response document. Partial selections of the structure are made by declaring nodes using the @path attribute of the repeatable <node> element. <partial> can be used to select individual leaf nodes for individual concepts or branch nodes if all concepts below a chosen concept are to be included. A partial search must take care that response documents are still valid, as they may include elements not mapped by the provider or may omit mandatory elements. Mandatory elements or attributes of the response structure, which are not listed in the partial search, must be included explicitly by the provider in the response content.

<filter>
Optional element to specify filter conditions (see section about Filters).

<orderBy>
Optional repeatable element that can be used to declare one or more concepts for ordering returned data. If the optional Boolean attribute @descend is set to "true", a descending ordering will be used instead of the default ascending one.
<?xml version="1.0" encoding="UTF-8"?>
<searchTemplate xmlns="http://rs.tdwg.org/tapir/1.0"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                                    http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <label>Specimen by Name</label>
  <documentation>Search specimens by their scientific name. Result is 
  ordered by name (ascending) and catalog number (descending). 
  A parameter "name" can be used to build the filter.</documentation>
  <outputModel location="http://example.net/models/names.xml"/>
  <filter>
    <like>
      <concept id="http://example.net/schema/ScientificName"/>
      <parameter name="name"/>
    </like>
  </filter>
  <orderBy>
    <concept id="http://example.net/schema/ScientificName"/>
    <concept id="http://example.net/schema/CatalogNumber" descend="true"/>
  </orderBy>
</searchTemplate>

Example: An XML search template definition based on an external output model.

6. Operations

6.1. Metadata

The Metadata operation retrieves a basic description of the TAPIR service, such as its title, an abstract, keywords, related people and organisations, and copyright details. The inclusion of a language attribute in content elements allows content to be served in multiple languages.

Metadata responses should include enough information to be used by registries, such as UDDI, and to be used by other directory services that need to know general information about the content provided. Metadata are always related to a single TAPIR data provider, which is regarded as a completely independent service.

6.1.1. Metadata Request

Metadata is the default operation in TAPIR. The simplest way to invoke this operation is by calling the TAPIR access point directly without any parameters.

Using XML, the Metadata operation can be invoked by inserting the <metadata/> element after the header section in a request document. Metadata requests take no arguments or parameters.

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
    </source>
  </header>
  <metadata/>
</request>

Example: Example of an XML metadata request document

In KVP request encoding, the Metadata operation can be invoked with a single parameter:

http://example.net/tapir.cgi?op=metadata

Or with no parameters at all:

http://example.net/tapir.cgi

6.1.2. Metadata Response

The structure of metadata responses must conform to the "metadataResultType" defined in the TAPIR XML Schema. Many of the elements in the "metadataResultType" are derived from the DC (Dublin Core) schemas.

The "metadataResultType" also includes an optional @xml:lang attribute to define a default language associated with all language-aware elements. Language-aware elements are those elements whose content is expressed in natural language. They are all unbounded and accept an optional attribute @xml:lang to specify the related language code. The language tag syntax of @xml:lang attributes is defined by the RFC 4646, Tags for the Identification of Languages and the list of codes can be found in the IANA Language Subtag Registry. The @xml:lang attribute applies to the element that defines it and also to all of its sub-elements. Sub-elements can also specify the @xml:lang attribute, in which case it will override the default language defined in the scope of any parent elements.

The metadata elements include;

<dc:title>
The name or names of the service, which may be in multiple languages (using the xml:lang attribute to identify the language code). String. Required.

<dc:type>
The type of resource according to the Dublin Core Type Vocabulary. This value should be the same for all TAPIR providers: http://purl.org/dc/dcmitype/Service, unless the type vocabulary is refined or changed in the future. The purpose is to indicate that the resource is actually a service. String. Required.

<accesspoint>
The URL of the service. String. Required.

<dc:description>
The description may include, but is not limited to, an abstract, a table of contents, a reference to a graphical representation of content, or a free-text account of the content. Can be provided in different languages. String. Required.

<dc:language>
The primary language of the data provided by the service. It is recommended to use codes defined by the IANA Language Subtag Registry. String. Required.

<dc:subject>
Subject and Keywords. Typically, a subject will be expressed as keywords, key phrases or classification codes that describe content provided by the resource. Recommended best practice is to select a value from a controlled vocabulary or formal classification scheme. String. Optional.

<dc:bibliographicCitation>
Recommended practice is to include sufficient bibliographic detail to identify the resource as unambiguously as possible, whether or not the citation is in a standard form. Can be provided in different languages. String. Optional.

<rights>
Information about who can access the resource or about its security status, access regulations, etc. String. Can be provided in different languages. Optional.

<dct:modified>
Date on which the service was last modified. Date string, optional.

<dct:created>
Date on which the service was created. Date string, optional.

<indexingPreferences>
Used to inform data aggregators and indexers about the preferred start time, duration and frequency for performing this operation. Optional element with three attributes:

  • @startTime: In the XML Schema time format.
  • @maxDuration: In the XML Schema duration format.
  • @Frequency: In the XML Schema duration format.

<relatedEntity>
A required, complex element indicating the entities related to the service.

<custom>
Optional element of any type to include any additional information that goes beyond the standard TAPIR metadata.

6.1.2.1. Related Entities

Related Entities describe one or more entities and their roles with respect to the service. In UDDI terms, TAPIR Related Entities correspond to Business Entities. A Related Entity can be for example the organisation or group that is hosting the service, providing the data, sponsoring the network, etc. This allows acknowledgement to any kind of organisation or even person that is somehow related to the service.

Related Entities are defined by the "relatedEntityInformationType", which is comprised of <role> and <entity> elements defined by the "entityInformationType".

The elements defined by the "relatedEntityInformationType" are as follow:

<role>
Used to specify one or more roles of a related entity. The suggested vocabulary includes two values, "data supplier" or "technical host", but accepts other values. String. Required.

<entity>
Required complex element that can occur multiple times. It includes the following sub-elements:

<identifier>
A globally unique identifier for the entity. String. Optional.

<name>
One or more names for the organisation. Includes an @xml:lang attribute to record language. String. Required.

<acronym>
The usual acronym for the organisation. String. Optional.

<logoURL>
A URL pointing to a small logo of the organisation. String. Optional.

<description>
Text description of the service. Includes an @xml:lang attribute to record language. String. Optional.

<address>
Free text for organisation address, not atomised. String. Optional.

<relatedInformation>
A URL that points to further information about the entity. String. Optional.

<hasContact>
Required complex element for details of individuals related to the entity. Includes role and VCARD details (see Contacts).

<geo:Point>
Optional complex element to indicate the entity location in decimal degrees (datum WGS84). Conforms to the W3C Basic Geo Vocabulary.

  • <geo:lat>: Latitude in decimal degrees (WGS84). Float. Required.
  • <geo:long>: Longitude in decimal degrees (WGS84). Float. Required.
  • <geo:alt>: Altitude in meters. Float. Optional.

<custom>
Slot available for extending the scope of the entity information provided. AnyType. Optional.

6.1.2.2. Contacts

Related entities must indicate at least one contact and its role.

<role>
Used to specify one or more roles of a contact with respect to the service. The suggested vocabulary includes two values, "data administrator" or "system administrator", but accepts other values. String. Required.

<vcard:VCARD>
Complex element from the VCARD namespace for personal details. Includes an @xml:lang attribute to record language.

<vcard:FN>
Free text non-atomised, full name of contact. Includes optional @xml:lang attribute. String. Required.

<vcard:TITLE>
Free text for contact's job title (e.g., "Director"). Includes optional @xml:lang attribute. String. Optional.

<vcard:TEL>
Free text for contact's telephone number. Includes optional @xml:lang attribute and optional enumerated @TYPE attribute with any of the values home, msg, work, pref, voice, fax, cell, pager, bbs, modem, car, isdn, or pics. String. Optional.

<vcard:EMAIL>
Text for contact's email. String. Optional.
<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://rs.tdwg.org/tapir/1.0"
          xmlns:dc="http://purl.org/dc/elements/1.1/"
          xmlns:dct="http://purl.org/dc/terms/"
          xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
          xmlns:vcard="http://www.w3.org/2001/vcard-rdf/3.0#"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0 
                              http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source accesspoint="http://example.net/tapir.cgi" 
            sendtime="2005-11-11T12:23:56.023+01:00">
      <software name="TapirProvider" version="1.0"/>
    </source>
  </header>
  <metadata>
    <dc:title>Global Dragonflies Database</dc:title>
    <dc:type>http://purl.org/dc/dcmitype/Service</dc:type>
    <accesspoint>http://example.net/tapir.cgi</accesspoint>
    <dc:description>Global database about Dragonflies observation and specimen records</dc:description>
    <dc:language>EN</dc:language>
    <dc:subject>dragonflies dragonfly observation specimen arthropoda insecta odonata</dc:subject>
    <dct:bibliographicCitation>Global Dragonflies Database</dct:bibliographicCitation>
    <dc:rights>Creative Commons License</dc:rights>
    <dct:modified>2006-07-01T09:35:14+01:00</dct:modified>
    <dct:created>2006-01-01T00:00:00+01:00</dct:created>
    <indexingPreferences startTime="01:30:00Z" maxDuration="PT1H" frequency="P1M" />
    <relatedEntity>
      <role>data supplier</role>
      <entity>
        <identifier>http://purl.org/biodiv/myorg</identifier>
        <name>My Organisation</name>
        <acronym>MYORG</acronym>
        <logoURL>http://example.net/myorg.png</logoURL>
        <description>My Organisation hosts and maintains biodiversity databases</description>
        <relatedInformation>http://example.net/myorg</relatedInformation>
        <hasContact>
          <role>data administrator</role>
          <vcard:VCARD>
            <vcard:FN>My Name</vcard:FN>
            <vcard:TITLE>Director</vcard:TITLE>
            <vcard:TEL>11 11 11111111</vcard:TEL>
            <vcard:EMAIL>myname@example.net</vcard:EMAIL>
          </vcard:VCARD>
        </hasContact>
        <geo:Point>
          <geo:lat>45.256</geo:lat>
          <geo:long>-71.92</geo:long>
        </geo:Point>
      </entity>
    </relatedEntity>
  </metadata>
</response>

Example: Example of a metadata response document

6.2. Capabilities

The Capabilities operation is used to retrieve the essential settings and technical information about a TAPIR service.

6.2.1. Capabilities Request

In XML, the Capabilities operation is invoked by inserting the <capabilities/> element after the header section in a request document. Capabilities requests take no arguments or parameters.

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
    </source>
  </header>
  <capabilities/>
</request>

Example: Example of an XML capabilities request document

In KVP request encoding, the Capabilities operation can be invoked with a single parameter, as follows:

http://example.net/tapir.cgi?op=capabilities

6.2.2. Capabilities Response

Capabilities responses contain five top level sections to indicate available operations, supported request encodings and parameters, mapped concepts, available variables and also global settings. An optional <custom> section can also be used to include any extra information not covered by the "capabilitiesResultType" defined in the XML Schema.

6.2.2.1. Operations

The <operations> element is intended to return the list of operations supported by the service, including optional capabilities that are specific to each operation. When an operation is supported, an element with the same name must be present inside the <operations> element. Ping, metadata and capabilities are mandatory operations that are simply declared with no further arguments.

<operations>
  </ping>
  </metadata>
  </capabilities>
</operations>

Example: A provider supporting only the mandatory operations

A provider that supports the inventory operation must indicate one or more supported inventory templates, or the <anyConcepts/> capability. Providers that only support inventory templates should not accept inventory requests that do not reference them. Each inventory template must be indicated with a <template> element with an attribute @location pointing to an external document defining the template. A WSDL (Web Service Description Language) document describing the inventory template and its interface can optionally be included with an attribute @wsdl.

<operations>
  </ping>
  </metadata>
  </capabilities>
  <inventory>
    <template location="http://example.net/tmpl/collector_inventory.xml"/>
    <template location="http://example.net/tmpl/genus_inventory.xml"
              wsdl="http://example.net/tmpl/genus_inventory.wsdl" />
  </inventory>
</operations>

Example: A provider supporting inventory with templates

Providers declaring the <anyConcepts/> capability should accept inventory requests involving one or more concepts that were advertised as being mapped by the provider. They must also support inventory requests involving any external inventory template that references any known concepts. Therefore, providers in this situation must additionally support arbitrary filters. The <anyConcepts/> capability is declared without any further arguments.

<operations>
  </ping>
  </metadata>
  </capabilities>
  <inventory>
    <anyConcepts/>
  </inventory>
</operations>

Example: A provider supporting inventory in any concept

Providers are also allowed to support both <anyConcepts/> and templates, in case they wish to point to specific inventory templates for any particular reason. However providers are not allowed to advertise support of the inventory operation just with an empty <inventory/> element.

If a provider supports the search operation, it must usually indicate either one or more supported search templates, or the <outputModels/> capability. Providers that only support search templates should not accept search requests that do not reference them. Each search template must be indicated with a <template> element having an attribute @location pointing to an external document defining the template. A WSDL (Web Service Description Language) document describing the search template and its interface can optionally be included with an attribute @wsdl.

<operations>
  </ping>
  </metadata>
  </capabilities>
  <inventory>
    <anyConcepts/>
  </inventory>
  <search>
    <template location="http://example.net/tmpl/search_by_taxonomy.xml"/>
    <template location="http://example.net/tmpl/search_by_geography.xml"
              wsdl="http://example.net/tmpl/search_by_geography.wsdl" />
  </search>
</operations>

Example: A provider supporting search with templates

Providers declaring the <outputModels/> capability should indicate either one or more <knownOutputModels> or the <anyOutputModels> capability. Both can be declared, but an empty <outputModels/> element will be considered invalid.

Providers that only support a specific list of output models must understand filters, partial selection of response structure nodes and "order by" parameters. But they can optionally support the <anyOutputModels> capability. When a provider declares support for a specific output model, it must be able to process any search request that references that same output model. Known output models are declared with the <outputModel> element with a @location attribute pointing to the document defining it.

<operations>
  </ping>
  </metadata>
  </capabilities>
  <inventory>
    <anyConcepts/>
  </inventory>
  <search>
    <outputModels>
      <knownOutputModels>
        <outputModel location="http://example.net/models/taxonomy_rss.xml"/>
        <outputModel location="http://example.net/models/geography_kml.xml"/>
      </knownOutputModels>
    </outputModels>
  </search>
</operations>

Example: A provider supporting search with known output models

Providers may also declare the <anyOutputModels> capability, in which case they need to indicate which subset of the XML Schema language they understand. <anyOutputModels> refers to the ability to respond to search requests involving arbitrary output model definitions, assuming they make use of concepts that are mapped by the provider. Output models include a response structure defined with XML Schema. XML Schema is a large and very complex specification and TAPIR does not expect providers to be able to understand or parse the whole language. The minimum set of the XML Schema language that needs to be understood by providers in this case is represented by the <basicSchemaLanguage/> capability, and it includes the following constructs of XML Schema: targetNamespace definition, element definition (including minOccurs and maxOccurs), attribute definitions (including attribute "use"), local definitions of complexType and simpleType, sequences, and the "all" definition. Therefore, when a provider declares the <anyOutputModel> capability, it must declare inside it at least the element <basicSchemaLanguage/>.

<operations>
  </ping>
  </metadata>
  </capabilities>
  <inventory>
    <anyConcepts/>
  </inventory>
  <search>
    <anyOutputModels>
      </basicSchemaLanguage>
      </import>
    </anyOutputModels>
  </search>
</operations>

Example: A provider supporting search with any output models

The following constructs of XML Schema can be optionally supported and declared as part of the <anyOutputModels> element:

Providers are allowed to support both search templates and output models. However, they are not allowed to advertise support of the search operation with an empty <search/> element.

When providers support search with output models they are also allowed to support both <knownOutputModels> and <anyOutputModels>, but they are not allowed to declare an empty <outputModels/> element.

6.2.2.2. Requests

The <requests> element is intended to provide information on what request encodings the service can respond to, whether it handles log-only requests, whether xslt can be applied on the server side and what filter capabilities are supported.

This section includes three sub-sections:

<encoding>
Indicates what request encodings are supported. The options are <kvp/> (Key-Value Pairs) and <xml/>. Support of key-value pairs is mandatory, while the XML encoding is optional.

<globalParameters>
Indicates if the provider "accepts", "requires" or "denies" log-only requests, and whether it supports applying xslt on the server side.

<filter>
Lists the expressions and Boolean operators understood by the provider service and which may be used in constructing filters.
<requests>
  <encoding>
    <kvp/>
  </encoding>
  <globalParameters>	
    <logOnly>denied</logOnly>
  </globalParemeters>
  <filter/>
</requests>

Example: Fragment of a <requests> declaration in a capabilities response document showing the minimum functionality that a TAPIR service must be able to provide. Note that in this case no filter capabilities are declared.

Filters are used in search and inventory operations. The <filter> element in capabilities responses lists all the filter operations and terms that are supported by the provider. When a provider declares the filter <encoding> element, a minimum set of filtering capabilities must be supported and indicated for the sake of clarity. The only optional filtering capability in this case is related to the arithmetic operators.

<requests>
  <encoding>
    <kvp/>
    <xml/>
  </encoding>
  <globalParameters>	
    <logOnly>accepted</logOnly>
    <xslt/>
  </globalParemeters>
  <filter>
      <encoding>
        <expression>
          <concept/>
          <literal/>
          <parameter/>
          <variable/>
          <arithmetic>
            </add>
            </sub>
            </div>
            </mul>
          </arithmetic>
        </expression>
        <booleanOperators>
          <logical>
            <not/>
            <and/>
            <or/>
          </logical>
          <comparative>
            <equals caseSensitive="false"/>
            <greaterThan/>
            <greaterThanOrEquals/>
            <lessThan/>
            <lessThanOrEquals/>
            <in/>
            <isNull/>
            <like caseSensitive="false"/>
          </comparative>
        </booleanOperators>
      </encoding>
  </filter>
</requests>

Example: A TAPIR provider declaring support to both xml and kvp encoding, accepting log-only requests and xslt on server side, and declaring the complete filter functionality.

When a provider supports filters, it can indicate if they are case sensitive or not. When not indicated, it defaults to "true" (case sensitive).

6.2.2.3. Concepts

The <concepts> element is intended to provide details of recognised concept name servers, conceptual schemas and individually mapped concepts from those schemas.

Support for concept name servers is optional, but at least one conceptual schema must be mapped with at least one concept. The underlying database structure remains opaque to TAPIR clients, which interact with the underlying database by reference to the mapped concepts. Concept name servers act as a repository of conceptual schemas. Any number of concept name servers can be listed as long as they are not in contradiction. There is no preferred locality for name servers, the only restriction is that they have a valid URL, which responds with a text file compliant with the format described in the specific section about Concept Name Servers.

Concept name servers are identified using the single @location attribute of the <server> element and can be cited as any valid URI.

Recognised conceptual schemas are declared using the @namespace and @location attributes of the <schema> element, both of which are required and must be valid URIs. An optional @alias can be assigned to the schema, usually reflecting a CNS alias for the schema.

Within each schema declaration the provider must list the recognised concepts of that schema using the <mappedConcept> element. Each concept is declared through an instance of the <mappedConcept> element, which contains four attributes:

@id
The fully qualified concept identifier. Required. String.

@searchable
Denotes whether the concept can be used in filters. Optional (defaults to "true"). Boolean.

@required
Denotes that this is a mandatory concept that needs to be present in output models of all search requests. Optional (defaults to "false"). Boolean.

@alias
A local alias associated with the concept, usually reflecting a CNS alias for the concept. Optional. String.
<concepts>
  <conceptNameServers>
    <server location="http://example.net/cns/main.txt"/>
    <server location="http://example.net/cns/optionals.txt"/>
  </conceptNameServers>
  <schema namespace="http://example.net/s/1"
          location="http://example.net/s/1/schema.xsd">
    <mappedConcept id="http://example.net/s/1/CollectionCode"/>
    <mappedConcept id="http://example.net/s/1/CollectionName" searchable="false"/>
  </schema>
  <schema namespace="http://example.net/s/2"
          location="http://example.net/s/2/schema.xsd">
    <mappedConcept id="http://example.net/s/2/CatalogNumber"/>
    <mappedConcept id="http://example.net/s/2/Rights" required="true"/>
  </schema>
</concepts>

Example: Example of a concepts declaration that might appear in a capabilities response document

6.2.2.4. Variables

The <variables> element indicates system environment variables that can be used as filter expressions (see section about filter expressions). Each supported variable must be indicated with an element of the same variable name inside the element <environment>. When the <variables> element is empty it means that no variables are supported.

<variables>
  <environment>
    <date/>
    <timestamp/>
    <dataSourceName/>
    <accessPoint/>
    <lastUpdated/>
  </environment>
 </variables>

Example: Fragment of a capabilities response declaring all environmental variables defined by the TAPIR XML Schema and an extra one.

6.2.2.5. Settings

The <settings> element indicates specific service settings related to server overload caused by requests for excessive amounts of data. There are five settings of interest to client software, all of them optional, and further ones can be declared using the <custom> element. The five settings all take positive integer values:

minQueryTermLength
The minimum length of a wildcard string used in "like" expressions.

maxElementRepetitions
The maximum number of repetitions allowed for any repeatable XML elements in search and inventory responses. It can also be used as a reference for paging.

maxElementLevels
The maximum number of nested XML levels allowed for search responses (not including the TAPIR envelope).

maxResponseTags
The maximum number of XML tags that can be returned in responses.

maxResponseSize
The maximum size in kilobytes allowed to be returned in responses.
<settings>
  <minQueryTermLength>2</minQueryTermLength>
  <maxElementRepetitions>100</maxElementRepetitions>
  <maxElementLevels>20</maxElementLevels>
</settings>

Example: Fragment of a capabilities response showing service settings.

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://rs.tdwg.org/tapir/1.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0 
                              http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source accesspoint="http://example.net/tapir.cgi" 
            sendtime="2005-11-11T12:23:56.023+01:00">
      <software name="TapirProvider" version="1.0"/>
    </source>
  </header>
  <capabilities>
    <operations>
      </ping>
      </metadata>
      </capabilities>
      <inventory>
        <anyConcepts/>
      </inventory>
      <search>
        <outputModels>
          </anyOutputModels>
        </outputModels>
      </search>
    </operations>
    <requests>
      <encoding>
        <kvp/>
        <xml/>
      </encoding>
      <globalParameters>	
        <logOnly>accepted</logOnly>
        <xslt/>
      </globalParemeters>
      <filter>
          <encoding>
            <expression>
              <concept/>
              <literal/>
              <parameter/>
              <variable/>
              <arithmetic/>
            </expression>
            <booleanOperators>
              <logical>
                <not/>
                <and/>
                <or/>
              </logical>
              <comparative>
                <equals caseSensitive="false"/>
                <greaterThan/>
                <greaterThanOrEquals/>
                <lessThan/>
                <lessThanOrEquals/>
                <in/>
                <isNull/>
                <like caseSensitive="false"/>
              </comparative>
            </booleanOperators>
          </encoding>
      </filter>
    </requests>
    <concepts>
      <schema namespace="http://example.net/s"
              location="http://example.net/s/schema.xsd">
        <mappedConcept id="http://example.net/s/CatalogNumber"/>
        <mappedConcept id="http://example.net/s/ScientificName"/>
      </schema>
    </concepts>
    <variables/>
    <settings>
      <minQueryTermLength>2</minQueryTermLength>
      <maxElementRepetitions>100</maxElementRepetitions>
      <maxElementLevels>20</maxElementLevels>
    </settings>
  </capabilities>
</response>

Example: Example of a full capabilities response document

6.3. Inventory

The Inventory operation is used to retrieve distinct values for one or more concepts specified as parameters. It returns aggregated data in the mode of a DISTINCT select in SQL, as opposed to individual records returned by the Search operation. When more than one concept is specified, inventory responses must return distinct combinations of values.

Concepts used as parameters may come from different conceptual schemas and must be specified either with their fully qualified identifiers or with aliases. If aliases are used, the TAPIR implementation must be configured to use the relevant concept name server.

If a provider supports the inventory operation, it must advertise either one or more inventory templates or support the <anyConcepts> capability. Providers may support both options and choose how they wish to process requests.

6.3.1. Inventory Request

Inventory requests can make use of inventory templates or may specify the concept(s) and an optional filter directly in the message. Paging parameters can also be used.

In XML, the inventory operation can be invoked by inserting the <inventory> element after the header section in a request document, and then specifying an inventory template or the specific parameters.

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
    </source>
  </header>
  <inventory>
    <template location="http://example.net/tmpl/genus_inventory.xml"/>
  </inventory>
</request>

Example: Example of an XML inventory request document using a template.

In KVP request encoding, the same example could be invoked with:

http://example.net/tapir.cgi?op=inventory&template= http://example.net/tmpl/genus_inventory.xml

If the template included a filter with a parameter "type" restricting results according to the basis of record, it could be invoked with

http://example.net/tapir.cgi?op=inventory&template= http://example.net/tmpl/genus_inventory.xml&type=specimen

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
    </source>
  </header>
  <inventory>
    <concepts>
      <concept id="http://example.net/schema1/Country"/>
      <concept id="http://example.net/schema1/Genus"/>
    </concepts>
  </inventory>
</request>

Example: Example of an XML inventory request document (looking for unique combinations of genus and country) specifying concepts but no filter.

In KVP request encoding, the same example could be invoked with

http://example.net/tapir.cgi?op=inventory&concept=http://example.net/schema1/Country& concept=http://example.net/schema1/Genus

or, using concept aliases

http://example.net/tapir.cgi?op=inventory&count=false&start=0&limit=100& concept=Country@schema1&concept=Genus@schema1

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
    </source>
  </header>
  <inventory count="true" limit="100" start="0">
    <concepts>
      <concept id="http://example.net/schema1/Country" tagName="country"/>
      <concept id="http://example.net/schema1/Genus" tagName="genus"/>
    </concepts>
    <filter>
      <like>
        <concept id="http://example.net/schema1/Genus"/>
        <literal value="Luzu*"/>
      </like>
    </filter>
  </inventory>
</request>

Example: Example of an XML inventory request document (looking for unique combinations of genus and country) specifying concepts, custom tag names for the resulting values, paging parameters and a filter.

In KVP request encoding, the same example could be invoked with

http://example.net/tapir.cgi?op=inventory&count=true&start=0&limit=100& concept=Country@schema1&concept=Genus@schema1&tagname=country&tagname=genus& filter=Genus@schema1 like "Luzu*"

6.3.2. Inventory Response

The structure of inventory responses must conform to the "inventoryResultType" defined in the TAPIR XML Schema. The body of the inventory message must list the concepts used to create the inventory using the <concepts> element. Individual inventory records, which represent unique combinations of multiple concepts, are returned as one or more <record> elements.

Each <record> element lists the value or values found in the order that concepts are listed under <concepts>. If count was requested, then each <record> must include an attribute @count, giving the number of occurrences of this combination in the underlying data source. If paging was requested, then a <summary> element must also be returned.

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://rs.tdwg.org/tapir/1.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0 
                              http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source accesspoint="http://example.net/tapir.cgi" 
            sendtime="2005-11-11T12:23:56.023+01:00">
      <software name="TapirService" version="1.0"/>
    </source>
  </header>
  <inventory>
    <concepts>
      <concept id="http://example.net/schema1/Country"/>
      <concept id="http://example.net/schema1/Genus"/>
    </concepts>
    <record>
      <value>AUSTRALIA</value>
      <value>Calicium</value>
    </record>
    <record>
      <value>AUSTRALIA</value>
      <value>Fellhanera</value>
    </record>
    <summary start="0" next="2" totalReturned="2" totalMatched="35"/>
  </inventory>
</response>

Example: An inventory response showing country and genus combinations

If the request references an unmapped concept an <error> should be returned.

When the request specifies a custom "tagName" for the concept, then this name should be used instead of the default <value> tag.

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://rs.tdwg.org/tapir/1.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0 
                              http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source accesspoint="http://example.net/tapir.cgi" 
            sendtime="2005-11-11T12:23:56.023+01:00">
      <software name="TapirService" version="1.0"/>
    </source>
  </header>
  <inventory>
    <concepts>
      <concept id="http://example.net/schema1/Country"/>
      <concept id="http://example.net/schema1/Genus"/>
    </concepts>
    <record>
      <country>AUSTRALIA</country>
      <genus>Calicium</genus>
    </record>
    <record>
      <country>AUSTRALIA</country>
      <genus>Fellhanera</genus>
    </record>
    <summary start="0" next="2" totalReturned="2" totalMatched="35"/>
  </inventory>
</response>

Example: An inventory response showing country and genus combinations with values enclosed by custom tag names

6.4. Search

The Search operation is used to return non-aggregate records from data sources. Search requests make use of output models and filters to select the requested data. The returned records may also be counted and paged.

If a provider supports the search operation, it must advertise either one or more search templates or it must support the <outputModels> capability. Providers may support both options and choose how they wish to process requests. If a provider supports the <outputModels> capability, it must advertise either one or more known output models or it must support the <anyOutputModels> capability.

6.4.1. Search Request

Search requests can make use of search templates or specify all parameters directly in the message. Paging parameters can also be used.

In XML, the search operation can be invoked by inserting the <search> element after the header section in a request document, and then specifying a search template or the specific parameters.

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00">
    </source>
  </header>
  <search>
    <template location="http://example.net/tmpl/search_by_taxon.xml"/>
  </search>
</request>

Example: Example of an XML search request document using a template.

In KVP request encoding, the same example could be invoked with

http://example.net/tapir.cgi?op=search&template= http://example.net/tmpl/search_by_taxon.xml

If the template included a filter with a parameter "genus" restricting results according to a specified genus name, it could be invoked with

http://example.net/tapir.cgi?op=search&template= http://example.net/tmpl/search_by_taxon.xml&genus=Physalis

In addition to referring to external output models (both those known to the provider and user defined ones) by their URI, it is possible to declare an output structure directly within a request document through the <outputModel> element, whose structure is exactly as that used in external models.

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                             http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source sendtime="2005-11-11T12:23:56.023+01:00"/>
  </header>
  <search count="true" start="0" limit="1000">
    <outputModel>
      <structure>... concepts and relationships ... </structure>
      <indexingElement>... node path(s) for paging and counting ...</indexingElement>
      <mapping>... concept mapping elements ...</mapping>
    </outputModel>
    <filter>
      <like>
        <concept id="http://example.net/schema/ScientificName"/>
        <literal value="Luzu*"/>
      </like>
    </filter>
    <orderBy>
      <concept id="http://example.net/schema/Family"/>
      <concept id="http://example.net/schema/ScientificName"/>
    </orderBy>
  </search>
</request>

Example: Simplified example of an XML search document with in-line outputModel definition

When used in KVP encoding, output models must always be externally defined, and referenced by the parameter "model". Output model definitions cannot be encoded in KVP.

http://example.net/tapir.cgi?op=search&start=0&limit=10& model=http://example.net/models/specimens.xml&filter= http://example.net/schema/ScientificName like "Luzu*"&orderby= http://example.net/schema/ScientificName

The same example with concept aliases would be

http://example.net/tapir.cgi?op=search&start=0&limit=10& model=http://example.net/models/specimens.xml&filter= ScientificName@schema like "Luzu*"&orderby= ScientificName@schema

Search operations also include the possibility to remove the TAPIR envelope, which consists of the response element, the header, the search element, the summary and the diagnostics. This can be specified by the parameter "envelope". In XML it is an optional attribute (defaults to false) of the element search, and in KVP it is an independent parameter with the same name.

When the envelope is turned off, it is possible to instruct a provider to completely ignore namespace declarations and prefixes in the response. This is done through the "omit-ns" optional attribute (which defaults to false) of the element search.

If an error occurs when the envelope in turned off, the response should be an "error" element containg the error message. It should usually be possible to get more information about the error by sending another request with envelope turned on and then inspecting the diagnostics.

6.4.2. Search Response

All methods of formulating search queries are processed by the provider software to select data from its underlying data source and to return to the client in the form of an XML response message. The way in which the provider chooses to do this is not defined by the protocol. Search responses with the TAPIR envelope must validate against the "searchResultType" defined by the TAPIR XML Schema.

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://rs.tdwg.org/tapir/1.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                              http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
  <header>
    <source accesspoint="http://example.net/tapir.cgi" 
            sendtime="2005-11-11T12:23:56.023+01:00"/>
  </header>
  <search>
    <dataset xmlns="http://example.net/simple_specimen">
      <specimen catnum="234">
         <identification>
           <name>Luzula luzuloides</name>
         </identification>
      </specimen>
      <specimen catnum="290">
        <identification>
          <name>Luzula alpestris</name>
        </identification>
      </specimen>
    </dataset>
    <summary start="0" totalReturned="2"/>
  </search>
</response>

Example: Example of a search response document.

6.5. Ping

The Ping operation provides a means of establishing whether services are currently on-line and whether appropriate wrapper software is installed. This operation can also provide basic data about response times. It does not require a query to be run against a connected database, as is sometimes required of metadata and capabilities requests. Data providers are free to include as part of diagnostics any extra information that may be of value to monitor networks.

6.5.1. Ping Request

In XML, the ping operation is invoked by inserting the <ping/> element after the header section in a request document. Ping takes no arguments or parameters.

<?xml version="1.0" encoding="UTF-8"?>
<request xmlns="http://rs.tdwg.org/tapir/1.0"
                xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
                                    http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
<header>
  <source sendtime="2005-11-11T12:23:56.023+01:00">
       <software name="TapirClient" version="1.0"/>
  </source>
</header>
  <ping/>
</request> 

Example: TAPIR Ping Request. The simplest operation in TAPIR.

In KVP request encoding, the ping operation can be invoked with a single parameter

http://example.net/tapir.cgi?op=ping

6.5.2. Ping Response

<?xml version="1.0" encoding="UTF-8"?>
<response xmlns="http://rs.tdwg.org/tapir/1.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://rs.tdwg.org/tapir/1.0
          http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd">
<header>
  <source accesspoint="http://example.net/tapir.cgi" 
          sendtime="2005-11-11T12:23:57.023+01:00">
       <software name="TapirProvider" version="1.0"/>
  </source>
</header>
  <pong/>
</response> 

7. Global Parameters

All operations can make use of global common parameters. In XML, global parameters are passed as specific attribute values of the operation element, and are defined in the "globalParametersGroup". In KVP, they are passed as individual parameters. The list of global parameters include:

log-only
The log-only parameter can be used to instruct the service to log the request but not process it as it would normally do. It can be used to forward requests to the original providers, when users are querying third-party cached databases. This way, original providers can track usage of their data. Although it is possible to send log-only requests, clients must check the provider capabilities response before sending them. The <globalParameters> element of capabilities responses declares if log requests are required, accepted or denied. Providers that are unable to log requests on their system or do not want to receive them, should reply to a log-only request with an error message using the <error> element. If log-only requests are accepted or required, then the response must include the element <logged/> instead. Boolean. Optional (defaults to "false").

xslt
The xslt parameter points to an XML style sheet that can be used by XSLT processors to transform TAPIR responses. When xslt is specified without an xslt-apply parameter set to true, providers must include the corresponding "xml-stylesheet" processing instruction after the XML header, as shown below. However, providers are also free to place restrictions on the allowable domains related to the stylesheet location. In this case, if a stylesheet comes from an unknown location, it can be ignored - but an associated warning should be raised inside the diagnostics section. String representing a URL. Optional.

<?xml version="1.0" encoding="utf-8" ?>
<?xml-stylesheet type="text/xsl" href="http://example.net/trans.xsl"?>
<!-- TAPIR response -->
Example: Fragment of a TAPIR response showing the XML headers when the xslt parameter is passed and not asked to be applied on the server side.

xslt-apply
The xslt-apply parameter can be used to indicate whether the stylesheet pointed to by the xslt parameter should be processed on the server side. Clients must check a provider’s capabilities response to see if xslt-apply is supported. If xslt-apply is requested but not supported by the provider, it must return the untransformed data with the xslt link included in the document as "XML Processing Instructions" and note the error in the <diagnostic> element. When xslt-apply is requested, but the corresponding xslt parameter was not passed, providers are also recommended to raise a diagnostic message. Boolean. Optional (defaults to "false").

8. Counting and Paging

Inventory and search operations include counting and paging functionality. Both are done with reference to an indexing element (search operation) or to a record element (inventory operation). In XML, counting and paging parameters are defined in the "pagingParametersGroup" and passed as attribute values of the operation element. In KVP, counting and paging is done through specific individual parameters.

count
For Inventory operations the count parameter instructs the service to return the total number of distinct records and number of occurrences of each record in the data source that match the selection criteria. For search operations, the count parameter instructs the service to return the number of records returned and the number of records that matched the query in the @returned and @totalMatched attributes of the <summary> element respectively. Boolean. Optional (default = "false")

start
Index of the first record to be returned when paging results. Non-negative integer. Optional (default = "0", which corresponds to the index of the first record).

limit
Indicates the maximum number of records to be returned when paging results. If a request asks for count but specifies limit = "0" then no records are returned but the total number of records found is recorded in @totalMatched. Non-negative integer. Optional (when not specified it means "unlimited").

In XML, a typical use of paging parameters attributes would be

  <search count="true" start="0" limit="50">
   ....... 
  </search>

The same parameters in KVP would be

http://example.net/tapir.cgi?op=search&count=1&start=0&limit=50&...

When paging is requested, response documents must include a <summary> element of the "resultSummaryType", which returns information in the following set of attributes:

@start
Index of the first element that was returned.

@next
Index of the next record that could be retrieved using a subsequent request.

@totalReturned
The number of records actually being returned.

@totalMatched
If count was requested the totalMatched attribute gives the "estimated" number of total matching records - not necessarily the number of valid records that can possibly be returned by paging through the entire record set. This happens because records can be "dropped" or ignored if they fail to produce a valid XML representation according to the response structure.
...
<record count="10">
  <value>AUSTRALIA</value>
  <value>Calicium</value>
</record>
<record count="20" >
  <value>AUSTRALIA</value>
  <value>Fellhanera</value>
</record>
<summary start="0" next="2" totalReturned="2" totalMatched="35"/>
...

Example: Fragment of an inventory response showing count and paging values.

9. Filters and Expressions

9.1. Filters

Inventory and search operations may contain a filter specifying conditions to restrict returned data to a specific subset. TAPIR filters encode expressions and operators in an atomised form that can be translated to other query languages (e.g., SQL).

The ability to dynamically parse filters is optional in TAPIR and can be expressed as part of the capabilities response. Providers that do not support filters may still support query templates advertised in their capabilities. In this case, when the query template includes a filter in its definition, the meaning and the functionality of the filter must be understood by the provider. The provider may hard code a local query that translates the entire filter and then, when processing a request, substitute parameters that are usually present in query template's filters with their respective values.

9.2. Expressions

The atomised values in a TAPIR filter are represented by expressions. There are three types of expressions in TAPIR: simple expressions, complex expressions and variables. Expressions evaluate to a single value and are used by filter operators.

9.2.1. Simple Expressions

Simple expressions include elements that are directly associated with a single value. There are three possible types of simple expressions:

9.2.2. Complex Expressions

Complex expressions are represented by four arithmetic operators. The list below shows the arithmetic operators followed by their respective XML element.

Arithmetic operators are binary, so they always combine exactly two expressions as their arguments. The first argument must always be associated with the leftmost expression in the operation. This means that in subtractions the first expression corresponds to the minuend and the second corresponds to the subtrahend. In divisions the first expression corresponds to the divident while the second corresponds to the divisor.

<add>
  <literal value="20" />
  <literal value="22" />
</add>

Example: Use of a binary arithmetic operator.

9.2.3. Variables

Variables are elements that represent environment variables from the data provider system. TAPIR defines the following system variables that may be supported by provider implementations:

In XML, variables are represented by a <variable> element with a "name" attribute, such as

<variable name="date" />

Variables that are supported by a data provider must be advertised in capabilities responses. Data providers are also free to define and make use of additional system variables.

9.3. Logical and Comparative Operators

TAPIR supports a range of logical and comparative operators for building filters.

9.3.1. Comparative Operators

There are three types of comparative operators - unary, binary, and multiple.

9.3.1.1. Unary Comparative Operators

Unary comparative operators always take a single concept as argument. The only operator of this type is the isNull operator.

9.3.1.2. Binary Comparative Operators

Binary comparative operators always compare a concept with an expression. The first argument must always be the concept and is associated with the leftmost expression in the operation. The following operators are binary:

9.3.1.3. Multiple Comparative Operators

Multiple comparative operators always compare a concept with one or more simple expressions. This operator is equivalent to a sequence of "or" operators comprising equals comparisons between the concept and each simple expression. The only operator of this type is the "in" operator.

9.3.2. Logical operators

There are two types of logical operators – unary and multiple.

9.3.2.1. Unary Logical Operators

Unary logical operators take as argument a single Boolean operator, which can be any comparison operator or any logical operator.

9.3.2.2. Multiple Logical Operators

Multiple logical operators combine two or more Boolean operators, which can be any comparison operator or any logical operator.

10. KVP (Key-Value Pair) Requests

TAPIR requests can be encoded as KVP, as opposed to the XML encoding, and can be sent through HTTP GET or HTTP POST. Therefore, interaction with a TAPIR service can be done by means of URLs using CGI-style parameters. For instance, to ping a TAPIR service one can use the simple KVP GET encoded message

http://example.net/tapir.cgi?op=ping

All TAPIR operations can be invoked with KVP, though output model definitions cannot be expressed with KVP.

10.1. Parameter Rules

Parameter names are always case insensitive. Parameter values are case insensitive by default, except when used equals or like comparisons when the provider explicitly declared these operators to be case sensitive (see capabilities response for more details).

Parameters may be specified in any order, and any unknown parameters can be ignored. Parameters without values can also be ignored.

When creating custom parameters in filters, it is necessary to make sure that their names do not conflict with TAPIR specific parameters (see Appendix for the full list of reserved parameter names).

10.2. Global Parameters

The following parameters can be used in all TAPIR operations:

op = [ p | ping | m | metadata | c | capabilities | i | inventory | s | search ]
Specifies the requested TAPIR operation.

  • default = metadata : if op is not specified the service must send back a metadata response
  • cardinality = 1..1 : only one operation is allowed per request

xslt = [ URI ]
Gives the address of an XML style sheet to be included after the XML header or to be applied to the returned data.

  • default = null
  • cardinality = 0..1

xslt-apply = [ true | false | 1 | 0 ]
Indicates if a given XML style sheet (parameter xslt) should be applied to the returned data before returning it.

  • default = false
  • cardinality = 0..1

log-only = [ true | false | 1 | 0 ]
Used to indicate if the request should only be logged, not processed. Returns a log message instead of data.

  • default = false
  • cardinality = 0..1

source-ip = [ URI | NONE ]
Used to indicate the IP address of the original client when the message was processed by intermediate agents.

  • default = null
  • cardinality = 0..1

10.3. Ping Parameters

op = [ ping | p ]
The ping request has no other parameters.

10.4. Metadata Parameters

op = [ metadata | m ]
The metadata request has no other parameters.

10.5. Capabilities Parameters

op = [ capabilities | c ]
The capabilities request has no other parameters

10.6. Inventory Parameters

op = [ inventory | i ]

cnt, count ::= [ true | false | 1 | 0 | NONE ]
Indicates if the total number of distinct records and the number of occurrences for each record must be returned.
  • default = false
  • cardinality = 0..1

s, start = [ non-negative integer | NONE ]
Index of the first record to be returned.
  • default = 0 (defaults to the first record in the matching data)
  • cardinality = 0..1

l, limit = [ non-negative integer | NONE ]
The number of records to be returned.
  • default = NONE (all matching records are returned by default)
  • cardinality = 0..1

A choice must be made to use either a template, or one or more direct references to concepts with an optional filter.

t, template = [ URI ]
The URL of an Inventory Template document. When a template is present the concept and filter parameters are ignored.
  • default = null
  • cardinality = 0..1 (optional and only one template is allowed)

OR

c, concept = [ fully qualified identifiers or aliases ]
One or more concepts.
  • default = UNDEFINED (which must be interpreted by any provider implementation to mean no concept, which will result in an empty response)
  • cardinality = 0..n (at least one concept must be included if no template is specified)

n, tagname = [ string ]
One or more custom tag names (one for each concept).
  • default = "value"
  • cardinality = 0..n (optional, but when specified there must be one tagname for each concept)

f, filter = [ expression ]
A KVP filter.
  • default = null (no filter used)
  • cardinality = 0..1 (optional and only one filter statement is allowed)

10.7. Search Parameters

op = [ search | s ]

cnt, count = [ true | false | 1 | 0 | NONE ]
Indicates if the count of the records returned in the response and the number of matching records should be returned.
  • default = false
  • cardinality = 0..1

s, start = [ non-negative integer | NONE ]
Index of the first record to be returned.
  • default = 0 (defaults to the first record in the matching data)
  • cardinality = 0..1

l, limit = [ non-negative integer | NONE ]
The number of records to return.
  • default = NONE (all matching records are returned by default)
  • cardinality=0..1

e, envelope = [ true | false | 1 | 0 | NONE ]
Indicates if the TAPIR envelope (response, header, search, summary and diagnostics tags) should be suppressed or not.
  • default = true
  • cardinality = 0..1

omit-ns = [ true | false | 1 | 0 | NONE ]
If "true" and if the envelope is turned off, the produced content should not contain any namespace, even if the model defines it.
  • default = false
  • cardinality = 0..1

    A choice must be made to use either a template, or an output model parameter with optional "partial", "filter" and "orderby" parameters. The "template" parameter takes precedence over the "model" so if both are present the "model" (and optional "partial", "filter" and "orderby") parameters should be ignored.

t, template = [ URI ]
The URL of a Search Template document. When a template is present, the model, partial, filter, orderby and descend parameters should be ignored.
  • default = null
  • cardinality = 0..1

OR

m, model = [ URI ]
A pointer to an output model document.
  • default = null
  • cardinality=0..1 (when both template and model are present, the model parameter must be ignored)

p, partial = [ XPath ]
Gives one or more the XPaths to nodes from the output model schema that should be returned.
  • default = null
  • cardinality = 0..n

f, filter = [ expression ]
A KVP filter.
  • default = null
  • cardinality = 0..1

o, orderby = [ fully qualified concept identifiers or aliases ]
One or more concept identifiers to order results.
  • default = null (no particular order)
  • cardinality = 0..n

d, descend = [ true | false | 1 | 0 ]
Indicates if the order should be ascending or descending for each concept specified in "orderby". When present, it must have the same number of instances of "orderby".
  • default = false (ascending order)
  • cardinality = 0..n

10.8. Filters

Filter expressions in KVP requests will be infix equivalents to their XML counterparts.

10.8.1. Backus-Naur Form (BNF) Grammar for Filter Expressions

<expression>                    ::= <logical_operator> | <comparative_operator>

<comparative_operator>          ::= <unary_comparison_expression> |
                                    <binary_comparison_expression> | 
                                    <unbound_comparison_expression>

<logical_operator>              ::= <unary_logical_expression> | <binary_logical_expression>

<literal>                       ::= '"' <string> '"'

<concept>                       ::= <concept_alias> | <qualified_concept>

<concept_alias>                 ::= <local_concept_alias> "@" <namespace_alias>

<local_concept_alias>           ::= <string>

<namespace_alias>               ::= <string>

<qualified_concept>             ::= <string>

<value>                         ::= <literal> | <concept> | <arithmetic_expression>

<arithmetic_expression>         ::= <value> <arithmetic_operator> <value>

<unary_comparison_expression>   ::= <unary_comparison_operator> <concept>

<binary_comparison_expression>  ::= <value> <binary_comparison_operator> <value>

<unbound_comparison_expression> ::= <unbound_comparison_operator> <expression> {<expression>}

<unary_logical_expression>      ::= <unary_logical_operator> <expression>

<binary_logical_expression>     ::= <expression> <binary_logical_operator> <expression>

<unary_comparison_operator>     ::= "isNull"

<binary_comparison_operator>    ::= "equals" | "like" | "greaterThan" | "lessThan" | 
                                    "greaterThanOrEquals" | "lessThanOrEquals"

<unbound_comparison_operator>   ::= "in"

<unary_logical_operator>        ::= "not"

<binary_logical_operator>       ::= "and" | "or"

<arithmetic_operator>           ::= <add> | <div> | <mul> | <sub>

<add>                           ::= "+"
<sub>                           ::= "-"
<mul>                           ::= "*"
<div>                           ::= "/"

<string>                        ::= <any_char> { <any_char> }

10.8.2. Operators Precedence

Following are lists showing the precedence of filter operators:

expressions
arithmetic_operator, comparative_operator, logical_operator
arithmetic_operators
mul, div, add, sub
comparative_operators
isNull, equals, like, greaterThan, lessThan, greaterThanOrEquals, lessThanOrEquals
logical_operators
not, and, or

Blocks can be formed by using simple parentheses ( ).

10.8.3. Examples

isnull country@dwc2 or FullScientificName@abcd206 like "Abies*" and country@dwc2 equals "Spain"

The same example can be more explicit using parentheses, as follows:

((isnull country@dwc2) or ((FullScientificName@abcd206 like "Abies*") and (country@dwc2 equals "Spain")))

11. The TAPIR XML Schema

The official version of the TAPIR XML Schema (tapir.xsd) is located at:

http://rs.tdwg.org/tapir/1.0/schema/tapir.xsd

12. Appendix

12.1. Reserved Parameter Names

Parameter names defined in filters can be any string valid according to the HTTP Common Gateway Interface (CGI) standard. But as TAPIR operations can be called through pure KVP requests, some parameter names are reserved as TAPIR parameters and cannot be used as parameter names in filters.

The following parameter names are reserved for TAPIR:

c
cnt
concept
count
descend
d
e
envelope
f
filter
l
limit
log-only
m
model
n
o
omit-ns
op
orderby
p
partial
s
start
t
tagname
template
xslt
xslt-apply

12.2. Term Definitions

ABCD Access to Biological Collection Data. See http://www.bgbm.org/TDWG/CODATA/Schema/default.htm.
Access Point The URL (web address) of a Web Service.
Backus-Naur Form a metasyntax used to express context-free grammars. See http://en.wikipedia.org/wiki/Backus-Naur_form.
BioCASe Biological Collections Access Service. See http://www.biocase.org.
CNS Concept Name Server. A service to get information about existing conceptual schemas and their concepts.
Concept Definition of a property, class or relationship.
Conceptual schema A formal definition of concepts. It can also be seen as a data model or ontology.
Data source The term used in the BioCASE project for an access point.
Dublin Core Dublin Core Metadata Initiative. See http://dublincore.org
DiGIR Distributed Generic Information Retrieval. See http://digir.net.
Federation schema A conceptual schema adopted by a federation.
GBIF Global Biodiversity Information facility. See http://www.gbif.org.
GET HTTP communication method where form data are encoded as parameters in an extension to a URL. The GET method is principally used to transmit requests for data to a web server (e.g., a simple database search).
HTML Hypertext Markup Language. A subset of Standard Generalised Markup Language (SGML), used for authoring pages for the World Wide Web.
HTTP Hypertext Transfer protocol, the commonly used protocol for transmitting requests and documents between applications on the World Wide Web.
KVP Key-Value Pair. One of the possible encodings for TAPIR requests.
OGC Open Geospatial Consortium. See http://www.opengeospatial.org.
OMG Object Management Group. See http://www.omg.org/.
NCD Natural Collections Descriptions. A TDWG emerging standard for describing collections of natural history material. See http://www.tdwg.org/NCD/TDWG_NCD_Subgroup.htm.
normative Referring to a standard or set of norms that are understood to be correct. A normative document is one which describes how things ought to be and why.
Output Model An XML schema language (or potentially other) formatted response structure.
POST POST is an HTTP communication method that can include any kind of data or command. The data are encoded separately and do not form part of the URL as in a GET message so this method is better for complex, sensitive, lengthy or non-ascii data.
protocol An agreed format for transmitting data between two or more devices.
Provider Originally defined as an organisation hosting either a DiGIR or a BioCASe service. In the context of TAPIR, an organisation hosting a TAPIR access point, which may point to several data sources.
Provider software Software running on a web server that facilitates access to data.
RDF Schema A language for describing vocabularies in the Resource Description Framework (RDF). See http://www.w3.org/TR/rdf-schema/.
SDD Structure of Descriptive Data. A TDWG, XML-based interoperability standard for descriptive data.
SOAP Simple Object Access Protocol, an XML-based messaging protocol used for invoking web services and exchanging structured data.
TAPIR TDWG Access Protocol for Information Retrieval.
TCS Taxonomic Concept Transfer Schema. An XML schema for the exchange of taxon concepts. See http://tdwg.napier.ac.uk.
TDWG Taxonomic Databases Working Group. See http://www.tdwg.org/.
TSA The Species Analyst, a research project developing standards and software tools for sharing biodiversity information. See http://speciesanalyst.net/.
UDDI Universal Description, Discovery and Integration. UDDI is a specification for maintaining standardised directories of information about web services.
URL Uniform Resource Locator. The address of a resource on the Internet
URI Uniform Resource Identifier. A formatted string that serves as an identifier for a resource, typically, but not exclusively, on the Internet. URIs are used in HTML hyperlinks.
W3C World Wide Web consortium. See http://www.w3c.org.
Web Service A service based on Internet Protocols, such as HTTP, SMTP or FTP, and also based on XML.
WFS Web Feature Services. An Open Geospatial Consortium XML-based standard to enable transfer of geographic feature data using Geography Markup Language (GML). See http://schemas.opengis.net/wfs/.
wrapper Software that allows standardised queries to be run against an underlying database.
WSDL Web Services Description Language. An XML format for describing Web Services as a set of end points operating on messages containing either document-oriented or procedure-oriented information. WSDL is the language used by UDDI.
XMI XML Metadata Interchange (XMI) is an OMG standard for exchanging metadata information via XML. See http://www.omg.org/technology/documents/formal/xmi.htm.
XML Extensible Markup Language developed by the W3C. A means of tagging data for transmission, validation and manipulation. See http://www.w3.org/XML and http://www.w3.org/TR/REC-xml.
XML Schema A formal definition of the required and optional structure and content of XML formatted documents within its domain. See http://www.w3.org/XML/Schema.
XPath Defines a way of locating and processing items in XML documents by using an addressing syntax based on the path through the documents logical tree structure. See http://www3.org/TR/xpath.
XQuery XML Query Language. A W3C specification for querying XML formatted data.

12.3. History of Changes

The following changes were made to this document since its first public release.

Date: February, 24th, 2007

Date: February, 7th, 2007

Date: January, 22nd, 2007