RDF over TAPIR
Roger Hyam
Abstract
The TDWG standards architecture relies on the melding together of two technologies that are often thought to be antagonistic: the Resource Definition Framework (RDF) and XML Schema.
RDF is based on a modelling language that describes everything in terms of subject-predicate-object statements (known as triples). This may be familiar from formal logic. RDF can be serialised in many ways. One of those ways is as XML.
XML Schema is a language for defining XML document structures. It is possible to define an XML document structure using XML Schema so that the resulting documents are valid serialisations of RDF.
TAPIR is a data exchange protocol designed to pass XML messages. The output from a TAPIR data provider is described using XML Schema. TAPIR knows nothing about RDF but by using XML Schemas that define RDF instance documents it is possible for a TAPIR data provider to behave as an RDF data source. This is demonstrated using the TapirLink provider software.
One of the strengths of the TAPIR protocol is that it allows the definition of custom response types (output models). This can act as a mapping point between conceptual schemas. It should therefore be possible to map other TAPIR concepts into RDF that uses the TDWG ontology. This is demonstrated using data sources mapped to DarwinCore. It should also be possible to map any TAPIR data source to generic RDF.
There are a series of limitations to these approaches. Defining RDF instance data using XML Schema is not ideal because it is not possible to control the use of attributes of elements according to whether the element has content and thus prevent the simultaneous occurrence of an rdf:resource attribute and embedded content, which would be illegal. This is largely overcome in the demonstrations because it is known a priori whether a value is a literal or resource link. XML Schema is awkward to use when there are many namespaces in the instance document. Current examples use around ten separate XML Schema documents. This could become a performance issue in the future and imposes an implementation burden on TAPIR wrapper software. The current examples make use of the TapirLink provider software which does not implement complex internal data structures, only 'flat' tables. The PyWrapper TAPIR provider has been shown to support RDF in initial tests but not tested with the current examples.
RDF is based on a modelling language that describes everything in terms of subject-predicate-object statements (known as triples). This may be familiar from formal logic. RDF can be serialised in many ways. One of those ways is as XML.
XML Schema is a language for defining XML document structures. It is possible to define an XML document structure using XML Schema so that the resulting documents are valid serialisations of RDF.
TAPIR is a data exchange protocol designed to pass XML messages. The output from a TAPIR data provider is described using XML Schema. TAPIR knows nothing about RDF but by using XML Schemas that define RDF instance documents it is possible for a TAPIR data provider to behave as an RDF data source. This is demonstrated using the TapirLink provider software.
One of the strengths of the TAPIR protocol is that it allows the definition of custom response types (output models). This can act as a mapping point between conceptual schemas. It should therefore be possible to map other TAPIR concepts into RDF that uses the TDWG ontology. This is demonstrated using data sources mapped to DarwinCore. It should also be possible to map any TAPIR data source to generic RDF.
There are a series of limitations to these approaches. Defining RDF instance data using XML Schema is not ideal because it is not possible to control the use of attributes of elements according to whether the element has content and thus prevent the simultaneous occurrence of an rdf:resource attribute and embedded content, which would be illegal. This is largely overcome in the demonstrations because it is known a priori whether a value is a literal or resource link. XML Schema is awkward to use when there are many namespaces in the instance document. Current examples use around ten separate XML Schema documents. This could become a performance issue in the future and imposes an implementation burden on TAPIR wrapper software. The current examples make use of the TapirLink provider software which does not implement complex internal data structures, only 'flat' tables. The PyWrapper TAPIR provider has been shown to support RDF in initial tests but not tested with the current examples.