As presented in "Digital publications for digital libraries," we view digital libraries as more than just collections of digitally published works. A digital library allows a collection of digitally published works to interoperate in terms of their fundamental semantics.
The architecture for these operations is a collection of network services, each working with particular kinds of data, and enabling appropriate operations, depending on the kinds of information involved.
In the first phase of our work, we have focused on four interrelated projects that collectively provide the infrastructure for:
(For planned work to add services for working with collections of information records, with digital images, and with geographic and spatial data, see an overview of phase 2.)
The CTS protocol joins an abstract notion of a textual work, independent of particular editions or versions (e.g., "the Histories of Herodotus"), with a notion of canonical citation schemes that can function across all versions of a notional work (e.g., "Hdt., 1.5"). The resulting service provides access to a text corpus organized by a traditional notion of "work" and citable by traditional notion of "canonical reference"
(See more information.)
Programs following the CTS protocol need to be able to find canonical identifiers for text groups and works, so that they can recognize versions of the same work from different CTS services. More generally, for many applications we want a service that can provide standard identifiers for any kind of markup used within a text (persons, or places, for example).
Registry services are similar to other name lookup services, but with two unusual features:
(See more information.)
URNs are an IETF standard for referring to persistent resources, independent of any location. (For example, there is a URN standard for ISBNs to identify editions of books, independent of any reference to where actual copies exist.)
The CTS URN project is developing a formal proposal for submission to the IETF, defining the semantics and a URN syntax for references to classical texts. This notation expresses the CTS notions of work and of citation in a single character string that can easily be used by any application wishing to refer to persistent text resources.
CTS URNs can point to units as small as one character in a document's native character set. Version 1.2 of the CTS protocol directly supports requests expressed in CTS URN notation.
(See more information.)
Reference Indexing services map any kind of non-CTS information onto text references, expressed as CTS URNs.
"Simple indexing" maps single atomic values onto a CTS URN. An index of persons could be implemented as a simple index, for example.
"Complex indexing" maps arbitrary XML data to a CTS URN. The XML data is accompanied by a namespace reference identifying a Relax NG schema. An index of full morphological analyses might be implemented as a complex index, with data validating against a schema describing morphological analyses, for example.
(See more information.)