Skip to main content
Layer 1
Logo KB Lab
Hoofdnavigatie
Datasets
Tools
Tutorials
News and events
Blogs
About us
Affiliated researchers
Team
Contact
Secondary menu
NL
Open Menu
zoeken
Csv-42
Automatically extract XML content with Python
A quick-start into working with XML files using Python. The course covers various XML formats.
Web collection NL-blogosfeer
Metadata-datasets and collection description regarding the NL-blogosfeer: collection of Dutch weblogs.
Historical newspapers OCR ground-truth
A dataset consisting of 2000 pages historical newspaper groundtruth, OCR and images.
Entangled Histories: Ordinances of the Low Countries
This special collection is made up of 108 books of ordinances published in the Early Modern Era.
DBNL OCR Data set
This data set consists of 220 texts digitised by the DBNL in TEI and txt (OCR).
Web Collection Chinese Netherlands
This web collection contains archived websites from the Chinese community in the Netherlands.
Web collection internet archaeology Euronet-Internet (1994-2017)
This web collection is made up of archived websites hosted by internet provider Euronet.
SIAMESET
The SIAMESET dataset consists of images and metadata of advertisements from two Dutch newspapers.
Newspaper ngram collection
This dataset contains yearly counts for word ngrams from the KB newspaper collection.
Example set
This collection consists of a small selection of our digitised publications from the years 1870-1871.
Keyword generator
A command-line tool to extract significant keywords from a collection of sample texts.
jpylyzer
Jpylyzer is a validator and feature extractor for JP2 (JPEG 2000 Part 1) images.
DBNL ngram viewer
An ngram viewer counting terms and phrases in the Digital Library of Dutch Literature (DBNL).