With great pleasure we introduce two new researchers-in-residence who have started their KB projects recently. They will be researching the KB special collection of LGBT+ websites and entity linking.
Jesper Verhoef is a Postdoc Digitization, Media and Popular Culture, Cultural Heritage, and Creative Industries and started earlier this year with researching the special collection of LGBT+ websites archived by the Dutch National Library (KB). He wants to answer the question: 'What does the Dutch queer web sphere look like and how has it changed over time?' He does so by means of two computational, distant-reading methods: hyperlink analyses and Named Entity Recognition. He hopes that the project will not only further the emancipation of LGBT+ people but that it will also result in workflows to research KB’s rich-yet-underused web archive, and in semi-automated means to enhance decisions about which websites to include in collections.
Vera Provatorova works as a PhD student Information Retrieval / Natural Language Processing at the IRLab (University of Amsterdam). Her Researcher-in-Residence project focuses on entity linking in digital collections. She has divided her research in two questions:
- To what extent is entity linking on Dutch-language digital collections affected by entity overshadowing, and how can we make EL systems robust against it?
- How can we make a Dutch-language EL system robust against OCR noise?
To connect entities in the text layer of digital collections to knowledge bases (such as Wikidata) she sees three main challenges: OCR noise, when entity mentions as well as contextual information are corrupted; entity overshadowing, when less frequent entities are mistaken for more popular entities with the same surface form; and lack of resources and parametric knowledge compared to English-language models.
Both researchers will present their results later on in, among other things, blogs on the KB Lab. Jesper Verhoef has already published his first blogpost: Analyzing the LGBT+ Web Archive: From Data Preservation to Preparation.
The researcher-in-residence programme
Since 2014, the KB invites early career and promising scholars to spend six months at the Research department of the KB using a call for proposals. Scientists will work on their own research using data from the KB. These can be from digital born collections or large digitisation projects, for instance with digitized historical newspapers. But they can also use tools or datasets found here on the KB lab (and often produced by previous researchers-in-residence). Together with KB employees, they try to answer their research question using computational techniques. The results, in most cases blogs, tools or datasets, are made available in the KB Lab to ensure other researchers can use the outcomes as well.
Previous researchers-in-residence used this program to developed, for example, the Genre Classifier to automatically recognize genres in newspaper articles. We have also experimented with computer vision to research visual patterns in newspaper advertisements with SIAMESE and assessed whether the OCR quality was good enough for research purposes.
Review of proposals 2023
The researchers-in-residence are selected following a submitted proposal. Each proposal is reviewed by a commission of senior researchers. This year’s commission for the researcher-in-residence call consisted of the following members:
- dr. Andreas van Cranenburgh (RUG)
- dr. Laura Hollink (CWI)
- dr. Dirk van Miert (Huygens ING/UU)
- dr. Nanne van Noord (UvA)
- prof. dr. Els Stronks (UU)
- dr. Andreas Weber (UT)