Collection services: walkthrough with sample data set (draft for discussion)

The choice of the sample data set

The test data is a collection of longitude/latitude points tagged in a TEI-conformant text of Ptolemy's Geography. I have chosen this sample data in part because it allows me to experiment with three different implementations of the Collections API:

  1. XSLT transformations directly producing output conforming to the Collections services requirements.
  2. an XML:DB implementation using eXist.
    In this case, I load into eXist an XML representation of the data produced by a different set of XSLT transformations.
  3. a relational database implementation using Postgres.
    To import the data into Postgres, I first create an XML representation of the data with XSLT transformations, then use hibernate to persist the data in the RDBMS.

Structure of the sample data

The first sample data set has a very simple structure, seen in the following Collection inventory document. Except for the parts marked like this, the Collection inventory was generated automatically by an XSLT transformation of the Hibernate mapping file used to persist and work with parallel RDBMS and XML representations of the data.

The inventory includes a single collection, named lls, made up of instances of an object named record. These objects consist of the required id element, and a series of property elements: they do not contain any set elements. (I hope to add examples of sets in Collections services in a later version of this guide.) Each property element has name and type attributes.

In addition, for five properties, I have manually added an option element, indicating that the service supports the GetValidValues request for that element. For two properties with numeric data, I have manually included information about the valid range of values for that property. (See more on finding valid values below.)

At the time of this writing, the final XML namespace for the Collections schema has not yet been set; the example value http://bogus.chs.harvard.edu/ns/collections is a place holder that should not be mistaken for a valid XML namespace.

The Collections inventory for the example data set:

<?xml version="1.0" encoding="UTF-8"?>
<coll:Collections xmlns:coll="http://bogus.chs.harvard.edu/ns/collections"
    xmlns:dc="http://purl.org/dc/elements/1.1">
    <coll:admin>Service administered by Neel Smith, nsmith_at_hc@yahoo.com</coll:admin>
    <coll:summary>Geographic data in Ptolemy's Geography.</coll:summary>
    <coll:collection name="lls">
        <coll:metadata>
           
            <-- The other required DC metadata elements not present in 
	    Hibernate mapping file are not included in this example, but
	    should be added by hand here:
	    title, creator, coverage and rights
	    -->
            <dc:description> Ptolemaic lon-lat data point to be analyzed for 
                RAGE project. @author Neel Smith </dc:description>
        </coll:metadata>
        <coll:structure name="record">
            <id name="id" type="string"/>
            <property name="verified" type="boolean"/>
            <property name="name" type="string"/>
            <property name="continent" type="string">
	      <option name="supportsValidValues"/>
	    </property>
            <property name="province" type="string">
	      <option name="supportsValidValues"/>
	    </property>
            <property name="geotype" type="string">
	      <option name="supportsValidValues"/>
	    </property>
            <property name="ethnic" type="string">
	      <option name="supportsValidValues"/>
	    </property>
            <property name="physical" type="string">
	      <option name="supportsValidValues"/>
	    </property>
            <property name="gklondeg" type="string"/>
            <property name="gklonfract" type="string"/>
            <property name="gklatdeg" type="string"/>
            <property name="gklatfract" type="string"/>
            <property name="declatdeg" type="number">
      	      <range max="64"/>
	    </property>
            <property name="declondeg" type="number">
      	      <range max="180" min="0"/>
	    </property>
            <property name="declatmin" type="number"/>
            <property name="declonmin" type="number"/>
            <property name="urn" type="string"/>
        </coll:structure>
    </coll:collection>
</coll:Collections>

Querying the sample data

The following sample shows the truncated output (including only the first two of forty-eight objects retrieved) of a Collections query.

The request was submitted with these parameters:

The reply to the request looked like this:

<QueryCollection>
    <request>
        <!-- open-ended structure -->
    </request>
    <results>
        <record>
            <id>pt-ll-1125</id>
            <verified>false</verified>
            <name>*LOU=NA</name>
            <continent>Europe</continent>
            <province>italia</province>
            <geotype>paralios</geotype>
            <ethnic>tusci</ethnic>
            <physical/>
            <gklondeg>LB</gklondeg>
            <gklonfract/>
            <gklatdeg>MB</gklatdeg>
            <gklatfract>½D</gklatfract>
            <declatdeg>32</declatdeg>
            <declondeg>42</declondeg>
            <declatmin>0</declatmin>
            <declonmin>45</declonmin>
            <urn>urn:cts:greekLit:tlg0363.tlg001.3.1.4</urn>
        </record>
        <record>
            <id>pt-ll-1126</id>
            <verified>false</verified>
            <name>*LOU=NA</name>
            <continent>Europe</continent>
            <province>italia</province>
            <geotype>paralios</geotype>
            <ethnic>tusci</ethnic>
            <physical/>
            <gklondeg>LB</gklondeg>
            <gklonfract/>
            <gklatdeg>MB</gklatdeg>
            <gklatfract/>
            <declatdeg>32</declatdeg>
            <declondeg>42</declondeg>
            <declatmin>0</declatmin>
            <declonmin>45</declonmin>
            <urn>urn:cts:greekLit:tlg0363.tlg001.3.1.4</urn>
        </record>
        <-- results truncated:  
        only 2 of 48 records included here -->
    </results>
</QueryCollection>

Finding valid values

Two optional requests help applications ensure that queries are formulated with valid values for a given property.

Boolean properties are defined to have only two possible values, represented in the request/reply protocol by the strings true and false.

Numeric properties may optionally indicate in the Collections inventory a range of valid values using the range element. Either or both of maximum and minimum values may be supplied in the max and min attributes. In the Collections inventory (above), a maximum has been supplied for the declatdeg property; both maximum and minimum values are given for the declondeg property.

GetValueRange

The GetValueRange request is a covenience method that allows applications to discover this information dynamically without having to request an entire (possibly large) Collections inventory. Here are two examples from the sample data set:

First example: request parameters:

First example: reply:

<GetValueRange>
    <request>
        <!-- open ended structure -->
    </request>
    <reply>
        <max>180</max>
        <min>0</min>
    </reply>
</GetValueRange>

Second example: request parameters:

Second example: reply:

<GetValueRange>
    <request>
        <!-- open ended structure -->
    </request>
    <reply>
        <max>64</max>
        <min/>
    </reply>
</GetValueRange>

GetValidValues

For properties with either string or numeric values, a service may optionally support the GetValidValues request. This request may be particularly useful for properties containing data in a controlled vocabulary. In the Collections inventory, the service administrator indicates the availability of this request for a given property by including an option method with the name attribute supportsValidValues, as illustrated above.

The following example uses the sample data set:

Request parameters:

Reply:

<GetValidValues>
    <request>
        <!-- open ended structure -->
    </request>
    <reply>
    <value>achaea</value>
    <value>aethiopia-further</value>
    <value>aethiopia-nearer</value>
    <value>africa</value>
    <value>albania</value>
    <value>albion</value>
    <value>aquitania</value>
    <value>arabia-deserta</value>
    <value>arabia-felix</value>
    <value>arabia-petraea</value>
    <value>arachosia</value>
    <value>aria</value>
    <value>armenia-major</value>
    <value>armenia-minor</value>
    <value>asiaminor</value>
    <value>assyria</value>
    <value>babylonia</value>
    <value>bactria</value>
    <value>baetica</value>
    <value>belgica</value>
    <value>cappadocia</value>
    <value>carmania</value>
    <value>carmania-deserta</value>
    <value>cilicia</value>
    <value>colchis</value>
    <value>corsica</value>
    <value>crete</value>
    <value>cyprus</value>
    <value>cyrenaica</value>
    <value>dacia</value>
    <value>drangiana</value>
    <value>epirus</value>
    <value>galatia</value>
    <value>gedrosia</value>
    <value>germania</value>
    <value>hibernia</value>
    <value>hyrcania</value>
    <value>iazyges</value>
    <value>iberia</value>
    <value>illyria-dalmatia</value>
    <value>india-outside</value>
    <value>india-within</value>
    <value>italia</value>
    <value>libya-interior</value>
    <value>lowerpannonia</value>
    <value>lugdunensis</value>
    <value>lusitania</value>
    <value>lycia</value>
    <value>macedonia</value>
    <value>margiana</value>
    <value>marmarice</value>
    <value>mauritania-caesariensis</value>
    <value>mauritania-tingitana</value>
    <value>media</value>
    <value>mesopotamia</value>
    <value>mysia-lower</value>
    <value>mysia-upper</value>
    <value>narbonensis</value>
    <value>noricum</value>
    <value>palestine</value>
    <value>pamphylia</value>
    <value>paropanisadae</value>
    <value>parthia</value>
    <value>persis</value>
    <value>pontus-bithynia</value>
    <value>raetia-vindelica</value>
    <value>sacae</value>
    <value>sardinia</value>
    <value>sarmatia-asia</value>
    <value>sarmatia-europe</value>
    <value>scythia-beyond</value>
    <value>scythia-within</value>
    <value>serice</value>
    <value>sicilia</value>
    <value>sinae</value>
    <value>sogdia</value>
    <value>susiana</value>
    <value>syria-coele</value>
    <value>taprobane</value>
    <value>tarraconensis</value>
    <value>tauric-chersonnese</value>
    <value>thrace</value>
    <value>upperpannonia</value>
    </reply>
</GetValidValues>

Links