Skip to main content
News and events
Historical newspapers OCR ground-truth
A dataset consisting of 2000 pages historical newspaper groundtruth, OCR and images.
The SIAMESET dataset consists of images and metadata of advertisements from two Dutch newspapers.
The KBK-1M Dataset is a collection of 1,603,396 images and accompanying captions from 1922 – 1994
Europeana Newspapers NER
Data set for evaluation and training of NER software in Dutch, French, Austrian and German.
Ground-truth IMPACT project
Collection of 99,95% correct OCR of books, newspapers, parliamentary papers and radio bulletins.
This collection consists of a small selection of our digitised publications from the years 1870-1871.