(Semi-) Automatic Cataloguing of Textual Cultural Heritage Objects

Juliette Lonij, Sara Veldhoen en Martijn Kleppe

In collaboration with Iris Hendrickx, Radboud University

The KB | National Library of the Netherlands has been digitizing its collections at a rapid pace for a number of years now. Large amounts of scans and machine-readable text created from e.g. historical newspapers, periodicals and books are made available to end users through portals such as Delpher. At the same time, the amount of content deposited by publishers or harvested from the web in digital form, such as e-books, e-journals, and web pages, is growing quickly as well.

Rich and accurate descriptive metadata, ranging from title and author on the one hand to specialist scientific subject headings on the other, form an essential prerequisite for enabling users to effectively navigate these collections. The current practice of creating such metadata manually, however, has become prohibitively time-consuming and, in some cases, prone to error. We therefore invite researchers to explore possibilities for automatically extracting relevant metadata from the objects in our digitized and born digital collections, using methods and techniques from the field of Artificial Intelligence – the subfields of Machine Learning and Natural Language Processing in particular.

ICT with Industry workshop on automated metadata