Guide to Collection services (draft for discussion)

Overview: the network service

A Collection service exposes one or more sets of objects with matching (but not necessarily identical) structures to network discovery and querying. These sets of objects are called Collections.

The following guide briefly describes the Collection Inventory describing an instance of a Collections service, the structure of data in a collection, and how a collection can be queried. In parallel with this guide, a walkthough of a sample data set in a Collections service provides examples from a real-world data set.

Collection Inventory

A Collections service documents its collections in an XML data source called a Collection Inventory. The metadata about individual collections contained in the Collection inventory include both Dublin Core elements, and a description of the structure of objects in the collection based closely on the mapping file used in the Hibernate relational persistence project. (For information about Dublin Core Metadata Initiative, see the initiative's home page ; for Hibernate, see the hibernate project home page.)

A summary description and administrative contact describe the service; this is followed by one or more collection elements that in two sections, one for metadata and one for structure of the collection's objects.

(See an example inventory in the accompanying "walkthrough.")

Metadata

The metadata section consists of five elements taken from the Dublin Core metadata standard ( http://dublincore.org/documents/dces/): title, creator, description, coverage, and rights. These may appear in any order, and the title and creator elements may be repeated.

Note that the Relax NG schema incorporated into the formal specification of the protocol accepts any foreign element in the metadata section. The rationale is to reuse the existing work of the Dublin Core Metadata Initiative, and leave open the possibility of incorporating other, perhaps project-specific metadata without having to change the schema used in the definition of the protocol. This means that while the schema from the protocol definition will validate all documents conforming to the protocol definition, and can be used to validate the contents of the structure section, it but cannot be used to validate the metadata section by itself.

Structure

The structure of an object is defined in the collection's structure element: all objects in a given collection must match this definition.

Objects contain either typed properties (corresponding roughly to fields in a relational database, or elements in an XML structure) or sets (repeatable typed properties); properties in turn may recursively contain properties, so while the resulting description is syntactically minimal, it is capable of describing a wide range of objects.

This structure must include one root-level property with a value uniquely identifying an object in that collection. This is identified by the inventory's id element. The unique identifier may be of any of the Collections services' logical data types.

Required requests

A Collections service accepts requests submitted using HTTP GET. Every request includes an HTTP parameter named request, giving the name of the request; some allow further HTTP parameters specifying additional parameters to the request.

Collections services are required to support the following four requests. (See the walkthough of a sample data set in a Collections service for examples of requests and the data they return.)

    • name of request: GetCapabilities
    • additional parameters: none
    • returns the CollectionsInventory document
    • name of request: GetMetadata
    • additional parameters: CollectionID (required)
    • returns metadata describing a collection from the CollectionsInventory
    • name of request: DownloadCollection:
    • additional parameters: CollectionID (required)
    • returns contents of entire collection
    • name of request: QueryCollection
    • additional parameters: CollectionID (required), QueryCollectionXPath (required), OrderByLocationXPath (optional)
    • This query method returns a set of zero or more objects satisfying the query expressed in the Collection Services' XPath syntax, given by QueryCollectionXPath; optionally, results may be ordered by the XPath location expression OrderByXPath.

Optional requests

Additionally, collections services may support the GetValidValues and GetValueRange methods. Support for these optional methods is documented in the CollectionInventory document. (See the examples in the walkthrough).

    • name of request: GetValidValues
    • additional parameters: CollectionID (required), CollectionXPath (required)
    • returns a list of all unique values satisfying a query expressed in the Collection Services' XPath syntax.
    • name of request: GetValueRange
    • additional parameters: CollectionID (required), CollectionXPath (required)
    • returns maximum and minimum values for a numeric property indicated by the CollectionXPath parameter

Querying the Collection

The most frequent request of a Collection service is QueryCollection. Its QueryCollectionXPath parameter expresses a query using a syntax based closely on XPath. Specifically, it includes XPath location expressions, but allows them to be qualified with filter expressions using operators defined by the Collection services protocol.

As the structure of the Collection is defined in XML and the results of queries are expressed in XML as instances of that structure, it may seem natural that the Collection can be queried using XPath location paths. In contrast to using an XML data source, however, where document order is guaranteed to be always preserved, the order of the resulting objects in the set is defined by default as not significant. The QueryCollection method optionally allows a parameter expressing (as a second XPath location path) a part of the object structure to use for ordering the results.

Further, since the Collections service includes data types not defined in XPath, the Collections Service defines a series of type-specific operators that can be used in queries. Syntactically, the use of these operators mimics W3C XPath filter expressions.

Data types and operators

The following table summarizes the operators allowed in Collection queries, and gives examples from the data set used in walkthough of a sample data set in a Collections service.

Data type Operator Operation Example of QueryCollectionXPath Meaning of example
boolean booleq boolean equality test; valid values to test again are true or false /record[verified = true] find /record objects where verified property is true
string eq string equality test; valid values to test again are single-quoted string literals /record[ethnic eq 'tusci'] find /record objects where ethnic property exactly matches the string tusci
string contains literal substring match test; valid values to test again are single-quoted string literals /record[province contains 'india'] find record objects where province property contains the string india (so finds both inner and outer India)
string rematch regular expression match; valid values to test again are single-quoted regular expressions /record[gklatdeg rematch '^M[ABG]' find record objects where gklatdeg property matches the specified regular expression (so since gklatdeg is a beta-code representation of an integer in Milesian-style numeric notation, matches gklatdeg properties corresponding to numeric values 41, 42 or 43)
number = numeric equality test; valid values are any numeric value (integer or decimal) /record[declatdeg = 41] find record objects where declatdeg property equals 41
number lt numeric less-than test; valid values are any numeric value (integer or decimal) /record[declatdeg lt 41] find record objects where declatdeg property is less than 41
number gt numeric greater-than test; valid values are any numeric value (integer or decimal) /record[declatdeg gt 41] find record objects where declatdeg property is great than 41

Links