Skip to main content
Layer 1
Logo KB Lab
Hoofdnavigatie
Datasets
Tools
Tutorials
News and events
Blogs
About us
Affiliated researchers
Team
Contact
Secondary menu
NL
Open Menu
zoeken
Data-access-25
Extracting text from EPUB files in Python
Johan van der Knijff published a brief introduction to extracting unformatted text from EPUB files.
Dutch Novels 1800-2000
Dataset that contains a corpus of 1346 novels from DBNL.
Canonizer
The Canonizer demonstrates how well canonicity can be classified based on the text of a novel.
Historical growth of the KB web archive
Description of the KB web archive and it's growth since the KB started archiving.
Frame generator
Tool for extracting topics, keywords and their co-occurence patterns from a Dutch corpus.
Python API
Simple API to access KB collections using Python.
ALTO Edit
ALTO Edit is a simple browser-based post correction tool for ALTO XML files.