26 Nov 2024

Towards robust entity linking and disambiguation on Dutch historical documents

Inhoudsblokken
Body

Vera Provatorova was a researcher-in-residence at the KB in 2023.

Her project 'Towards robust entity linking and disambiguation on Dutch historical documents' explored the question: to what extent is entity linking on Dutch-language archival data affected by entity overshadowing, and how can we make EL systems robust against it? 

Body

Project

Entity linking can enrich a dataset by connecting it to entities in a structured knowledge base. This process consists of identifying named entities (NE’s), such as names of people, places, organizations etc., in a text, disambiguating these NE’s and connecting them to an entity in a knowledge base, such as Wikidata. Dutch historical texts are especially challenging for automatic entity linking due to the language and the often less-than-perfect OCR quality. 

Vera Provatorova attempted to tackle this challenge of entity linking DBNL to Wikidata, as described in her blogpost. 

Auteur
Picture of Vera Provatorova holding flowers.
Vera Provatorova
PhD student Information Retrieval / Natural Language Processing at the IRLab, Universiteit van Amsterdam
BIO
Vera Provatorova is a PhD student at IRLab, University of Amsterdam, with a background in applied mathematics and computer science and a passion for languages and history.