Texts included in the Corpus


The text ID column gives the unique, conventional identifier of the document, constructed from the region and the consecutive number. The Author is the institution issuing the document, the region is the relevant administrative unit inside the Commonwealth. These are data assigned based on the edition of the sources from which text was extracted.

The Title and Date columns present data that were extracted automatically from the scanned books (only for the documents where the process was successful). The titles are given in the form assumed by the editors, which usually was decided by themselves. The dates are in the format (year)-(month)-(day): the automatic detection system takes the first date appearing in the title or in the text. This is most often the official date of the original document.


Text ID Author Title Region Date Number of tokens
Text ID Author Title Region Date Number of tokens