Image-32

Automatically extract XML content with Python

A quick-start into working with XML files using Python. The course covers various XML formats.

Louis Couperus

Dutch Novels 1800-2000

Dataset that contains a corpus of 1346 novels from DBNL.

DBNL OCR Data set

This data set consists of 220 texts digitised by the DBNL in TEI and txt (OCR).

Narralyzer

Narralyzer finds and visualises characters in texts and the relationships between them.

CHRONReader

CHRONReader

With CHRONReader you can search in Delpher's newspaper images using categories and keywords.

icon for an image

SIAMESET

The SIAMESET dataset consists of images and metadata of advertisements from two Dutch newspapers.

Frame generator

Frame generator

Tool for extracting topics, keywords and their co-occurence patterns from a Dutch corpus.

Europeana Newspapers NER 1

Europeana Newspapers NER

Data set for evaluation and training of NER software in Dutch, French, Austrian and German.

Ground-truth IMPACT project

Collection of 99,95% correct OCR of books, newspapers, parliamentary papers and radio bulletins.

Python

Python API

Simple API to access KB collections using Python.

Keyword generator

Keyword generator

A command-line tool to extract significant keywords from a collection of sample texts.

Scansion tool

Scansion generator

The Scansion generator is a tool developed to detect meter in Dutch poetry.

jpylyzer

jpylyzer

Jpylyzer is a validator and feature extractor for JP2 (JPEG 2000 Part 1) images.