Introduction
Textual features and metadata for DBNL novels 1800-2000
This dataset contains a corpus of 1346 novels from DBNL. Included are metadata and textual features such as word counts and syntactic features. The metadata includes variables related to canonicity: public library information, secondary references, Wikipedia mentions, etc.
The dataset consist of two parts:
- Textual features and metadata (open access): https://zenodo.org/record/5786254
- Parsed texts (restricted access): https://zenodo.org/record/5887620
The titles have been selected using the following criteria:
- Novels and novellas
- Originally written in Dutch
- First published 1800-2000
- TEI from titles available on https://www.DBNL.org
A searchable version of the list of novels and metadata is available.
Acknowledgements: Information from public libraries was contributed by Trudie Stoutjesdijk and Eddie de Kok from Data Warehouse.
Citation
When using this dataset we ask you to cite it as follows:
Andreas van Cranenburgh, Sara Veldhoen, Michel De Gruijter (2022). Textual features and metadata for DBNL novels 1800-2000 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5786254.