Computer-vision-3

Clockwork picture of an itinerant dentist performing an extraction in French rural scene, wood frame, metal workings, first half 19th century. Science Museum, London. Attribution 4.0 International (CC BY 4.0) (cropped from original).

Extracting text from EPUB files in Python

Johan van der Knijff published a brief introduction to extracting unformatted text from EPUB files.

Louis Couperus

Dutch Novels 1800-2000

Dataset that contains a corpus of 1346 novels from DBNL.

Canonizer

Canonizer

The Canonizer demonstrates how well canonicity can be classified based on the text of a novel.

Work environment for tagging pictures

Ot & Sien dataset

Data for the development of the automatic visual object recognition tools in children’s books.

Courante_uyt_Italien

Is your OCR good enough?

Comprehensive assessment of the impact of OCR quality in Dutch newspaper, journal and book collections.

RDA Entity Finder

RDA Entity Finder

The RDA Entity Finder enables browsing through several bibliographic entities.

Example Dataset Entangled - French

Entangled Histories: Ordinances of the Low Countries

This special collection is made up of 108 books of ordinances published in the Early Modern Era.

Frame generator

Frame generator

Tool for extracting topics, keywords and their co-occurence patterns from a Dutch corpus.

Ground-truth IMPACT project

Collection of 99,95% correct OCR of books, newspapers, parliamentary papers and radio bulletins.

icon for an Parliamentary paper

Example set

This collection consists of a small selection of our digitised publications from the years 1870-1871.

Keyword generator

Keyword generator

A command-line tool to extract significant keywords from a collection of sample texts.

Alto

ALTO Edit

ALTO Edit is a simple browser-based post correction tool for ALTO XML files.

Polimedia

PoliMedia

Allows cross-media analysis of coverage of parliamentary debates in a uniform search interface.