The “First thousand years of Greek:” documentation

Creating the CHS Canon of Greek Literature

The necessary first step is to identify extant texts, how they are cited, and how their canonical citation scheme maps on to a TEI P5-compliant schema. The project is currently managing this information in a relational database system.

Creating valid TEI P5 texts

Once the entries in the unique CHS Canon of Greek Literature have been constructed for a corpus of texts, a variety of utility programs work with the that data and a source of public-domain readings of ancient texts, such as the TLG, to create TEI-conformant texts.

Creating a database of strings and lexical identifiers

Lemmatized indexing involves mapping a surface string (an inflected form) to a lexical entity identified by a unique identifier. Before we can create a lemmatized index, we need to establish the set of identifiers for lexical entities.

  1. Script to format data from Peter Heslin's "expert-data" package as a tabular mapping of surface strings (inflected forms) to the lemma strings used by the Perseus project's Morpheus system to label a lexical entity
  2. Script to extract from the Perseus project's electronic LSJ the unique identifiers for each article, together with the lemma used by LSJ to label that lexical entity
  3. Output from the previous two steps has to be reviewed and unified so that morphological identifications can be expressed with unique identifiers, rather than labelling lemmas

Creating a lemmatized index of P5-compliant texts