Csv-42

Automatically extract XML content with Python

A quick-start into working with XML files using Python. The course covers various XML formats.

NL-Blogoshere

Web collection NL-blogosfeer

Metadata-datasets and collection description regarding the NL-blogosfeer: collection of Dutch weblogs.

OCR scores

Historical newspapers OCR ground-truth

A dataset consisting of 2000 pages historical newspaper groundtruth, OCR and images.

Example Dataset Entangled - French

Entangled Histories: Ordinances of the Low Countries

This special collection is made up of 108 books of ordinances published in the Early Modern Era.

DBNL OCR Data set

This data set consists of 220 texts digitised by the DBNL in TEI and txt (OCR).

Chinees Nederland

Web Collection Chinese Netherlands

This web collection contains archived websites from the Chinese community in the Netherlands.

Euronet visualization

Web collection internet archaeology Euronet-Internet (1994-2017)

This web collection is made up of archived websites hosted by internet provider Euronet.

icon for an image

SIAMESET

The SIAMESET dataset consists of images and metadata of advertisements from two Dutch newspapers.

Newspaper ngram collection

This dataset contains yearly counts for word ngrams from the KB newspaper collection.

icon for an Parliamentary paper

Example set

This collection consists of a small selection of our digitised publications from the years 1870-1871.

Keyword generator

Keyword generator

A command-line tool to extract significant keywords from a collection of sample texts.

jpylyzer

jpylyzer

Jpylyzer is a validator and feature extractor for JP2 (JPEG 2000 Part 1) images.

DBNL Ngram viewer

DBNL ngram viewer

An ngram viewer counting terms and phrases in the Digital Library of Dutch Literature (DBNL).