The corpus of sejmik documents of Polish-Lithuanian Commonwealth aims to gather all the resolutions (singular laudum, plural lauda in both Latin and Polish) of sejmiks up to 1795, from the editions that are legally available. The corpus also includes other types of documents related to sejmiks, if they have been published along with lauda: letters of kings and other officials addressed to sejmik assemblies, and documents issued by institutions related to sejmiks.
The major part of the corpus consists of instructions written by sejmiks for the sejm deputies and fiscal regulations. The increasing number of the latter is linked to shifting the treasury matters, to an increasing extent, to sejmiks in the 17th century. The purpose of instructions was to communicate to the deputy the will of the local community of noble citizens; he was supposed to conform to this will in the sejm. Since any noble (a member of szlachta) could go to a most common type of sejmik (a sejmik ziemski) with an equal voice, sejmik documents should be considered materials for researching early modern direct democracy.
The corpus consists of documents in Polish and Latin. The majority of texts blend these two languages. An additional layer of linguistic (morphosyntactic) information was generated only for Polish.
The main use of corpus is an auxilliary one for a historian. It focuses more on providing as many sources as possible, and less on ensuring and uniform orthography or manual control of correctness of the texts (for economic reasons). The users should not rely on full correctness of individual data; it should be used only as an aid for research, or for analyses of statistical nature. The texts were read from images with OCR (Optical Character Recognition) software. A computer script was responsible for determining dates and boundaries and documents. Its accuracy is markedly worse than that of a human.
For the lingustic rpcoessing of texts, the models for Morfeusz and Concraft programs were used, prepared in the Institute of Computer Science of the Polish Academy of Sciences for The Electronic Corpus of 17th- and 18th-century Polish Texts (Polish Baroque Corpus; Korba)[1][2]. On the Baroque Corpus’ site more information on corpora of modern and historical Polish is available. Despite the sejmik corpus utilizing some Korba’s tools, it is a completely separate resource and the creators and administrators of the Baroque Corpus are not responsible for it.
Every fragment is annotated with information on the paper edition that is used and the range of pages where one can check a more authoritative version of the text (that is often available in digital libraries).
In the version 0.8 (29.05.2021), the corpus contains over 4 million tokens (which are technical units related to words). It consists of the following collections of documents:
This is only six voivodships and lands (not all with a full chronological coverage even for the period after 1572). The tariff for the łanowe, podymne and czopowe taxes in 1629[3] recognized around 50 taxable voivodships and lands only for the Polish Crown (Korona), not counting Lithuania. Nevertheless, the selection present in the corpus is not entirely random. The Ruthenian voivodship had an especially large population and paid the treasury the most. The Kraków voivodship, along with voivodship of the Great Poland (Wielkopolska) belongs to the so-called high voivodships (województwa górne), whose opinions were often followed by the others. More voivodship could be added to the corpus with their improving availability.
The aim of annotating authorship is the corpus is not so much indicating the person who prepared the text of the document as the person or institution who is the official issuer. If no one institution or individual can be named, the annotation becomes szlachta [of the particular region - named in Polish] or, as the last resort, inny (other).
The authorship annotations were entered manually, but the segmentation of scanned editions into individual documents was performed by a computer program. This means that some documents are artificially split into smaller units (because of the fragments erroneously recognized as headers) or amalgamated into one. In the latter case, which seems to be rarer, the author is marked as inny. Sometimes the overwhelming majority of the text decides, when the minority fragment is insignificant.
The texts of confederations, resolutions made on the common judicial assemblies (roki sądowe) and in military camps are, if possible, assigned to the sejmiks. The main test here is whether the text purports to speak in the name of all the nobles of the particular region (this is often accompanied by using the formula My, rady, dygnitarze i całe rycerstwo X (or obywatele X) – We, the councils, dignitaries and all the knighthood of X or citizens of X). The actual legitimacy of the given assembly is obviously not verified. If the document speaks for a number of persons mentioned by their names, the authorship annotation takes rather a form like szlachta województwa ruskiego.
[1] https://www.korba.edu.pl/overview (available in Polish and English)
[2] Włodzimierz Gruszczyński, Dorota Adamiec, Renata Bronikowska, Aleksandra Wieczorek. Elektroniczny Korpus Tekstów Polskich z XVII i XVIII w. – problemy teoretyczne i warsztatowe. „Poradnik Językowy” 8 (2020). Pp. 32–51.
[3] A. Filipczak-Kocur, Sejm zwyczajny z roku 1629, Warszawa 1979, p. 116-117.