Vera Provatorova is a PhD student at IRLab, University of Amsterdam, with a background in applied mathematics and computer science and a passion for languages and history. Her research interests are at the intersection of natural language processing, information extraction and digital humanities, with the goal of making digitised historical corpora easily searchable. Vera's current research focus is entity linking, and particularly the most challenging cases of this task: noisy data, unseen entities and entity overshadowing.
Her Research-in-Residence project focuses on entity linking in Dutch archives. She has divided her research in two questions:
- To what extent is entity linking on Dutch-language archival data affected by entity overshadowing, and how can we make EL systems robust against it?
- How can we make a Dutch-language EL system robust against OCR noise?
To connect entities from scanned Dutch archival text to knowledge bases (such as Wikidata) she sees three main challenges: OCR noise, when entity mentions as well as contextual information are corrupted; entity overshadowing, when less frequent entities are mistaken for more popular entities with the same surface form; and lack of resources and parametric knowledge compared to English-language models.