We are very happy to share that Giovanni Colavizza and Seyran Khademi are our researcher-in-residence this year. Giovanni is examining how the quality of OCR impacts downstream tasks. Seyran will be developing an algorithm that recognises an image without discerning between a historical and modern image.
What can you do with bad quality OCR?
As most researchers know, OCR does not always give the best results. In his project ‘Is your OCR good enough? A comprehensive assessment of the impact of OCR quality on downstream tasks’, together with KB Data Scientist Mirjam Cuper, Giovanni will examine how the quality of OCR impact further analysis. They are using data from Delpher and DBNL for this and hope to learn which tasks work better with good OCR and where OCR quality has less impact.
How can we search through modern and historical images at the same time?
In her project “DepTH: Deep Training on History” Seyran will work on the development of a computer vision algorithm that can work with historical and modern images simultaneously. Computer vision applications normally group images based on visual characteristics such as black/white or many pixels (photo) versus few pixels (illustrations). This means that older images are seperated from modern ones while this is not what we would like. Together with research software engineer Sara Veldhoen, Seyran will work on images extracted from Delpher's books to develop a new algorithm.
Giovanni is in the final stages of his research and will conclude his project in September 2020. Seyran will begin her work with us on 1 August 2020.