This special web collection Entangled Histories: Ordinances of the Low Countries is made up of 108 books of ordinances published in the Early Modern Era in the Low Countries (Habsburg Netherlands and the Dutch Republic). All texts included in this dataset were already digitised through the Google Books Project, or through individual digitisation-projects of several libraries. The readability of these digitised books was improved by using Transkribus’ Automatic Text Recognition (OCR – Abbyy FineReader v.11; HTR-models). These HTR-models were manually trained with GroundTruth data, consequently creating fitting models for Dutch Gothic print, Dutch Roman print and French Roman print.
Due to acclaimed copyright on the digitised images, this dataset only shows you the transcriptions. These transcriptions have been saved in the formats: Alto, Page, XML, docx and txt. The Alto and Page files have been compressed in .zip files, as these consists of individual files per page.
This project consisted of three phases, which have resulted in explanatory, in-depth blogposts:
1. improving the quality of the machine-readable texts to a <5% Character Error Rate (CER) by using HTR(+) instead of OCR;
2. segmenting the books of ordinances into individual legal texts;
3. machine-learned categorisation – based upon a pre-trained set.
Acknowledgement
This dataset was created while Annemieke Romein worked as Researcher-in-Residence at the KB National Library of the Netherlands (KB) on the Entangled Histories project. During the creation process of this dataset, she was assisted by Sara Veldhoen and Michel de Gruijter of the Research Department of the KB.
The authors wish to thank Lotte Wilms, Steven Claeyssens, Martijn Kleppe, Jeroen Vandommele and Ronald Nijssen of the KB for their assistance in the creation of this dataset and making it available to the research community. Furthermore, we wish to thank Ghent University Library, Bodleian Library and Utrecht University Library for providing us with the scans of additional books.
Articles about Entangled Histories
Under review:
- C.A. Romein, S. Veldhoen, M. de Gruijter (2019/2020), The Datafication of Early Modern Ordinances ATR-ed Texts, Segmentation, and Categorisation (under review).
Presentations were given in: Brussels, Liège, Joensuu (Finland), Amsterdam, Ghent; posters in Brussels (AYLH), Oxford (DHOxSS2019), Ghent (LW-Faculty Day), Liège (DHBenelux), Utrecht (DH2019) and Frankfurt (DLH-Conference).