Skip to main content
lab

Hoofdnavigatie

  • Datasets
  • Tools
  • News and events
  • Blogs
  • About us

Secondary menu

  • NL
Dark Light

Text-analysis-23

OCR scores

Historical newspapers OCR ground-truth

A dataset consisting of 2000 pages historical newspaper groundtruth, OCR and images.
Example Dataset Entangled - French

Entangled Histories: Ordinances of the Low Countries

This special collection is made up of 108 books of ordinances published in the Early Modern Era.
Europeana Newspapers NER 1

Europeana Newspapers NER

Data set for evaluation and training of NER software in Dutch, French, Austrian and German.
Icoon dataset

Ground-truth IMPACT project

Collection of 99,95% correct OCR of books, newspapers, parliamentary papers and radio bulletins.
icon for an Parliamentary paper

Example set

This collection consists of a small selection of our digitised publications from the years 1870-1871.

Filters

Content

  • Newspaper (4)
  • Book (3)
  • Manually corrected text (2)
  • Parliamentary paper (2)
  • Radio bulletin (2)
  • Journal (1)

Category

  • Enrichment (3)
  • Data access (1)
  • Visualisation (1)
  • (-) Text analysis (5)

File format

  • TXT (5)
  • CSV (3)
  • TEI (2)
  • JPEG2000 (1)
  • JSON (1)
  • MPEG21-DIDL (1)
  • TIFF (1)
  • (-) ALTO (5)

Copyright

  • Public domain/CC0 (4)
  • In copyright (1)

Product

  • Tool (3)
  • (-) Dataset (5)

In the KB Lab you can find experimental tools and data built for and from the digital collection of the KB, National Library of the Netherlands.

Footer-menu

  • Terms of use
kb-logo