An update on assessing the impact of OCR quality in KB collections

An update on assessing the impact of OCR quality in KB collections

Is an average OCR quality of 70% enough for my study? What OCR quality should we ask from external suppliers? Should we re-do the OCR of our collections to bring it from 80% to 85%? Libraries and researchers alike face the same dilemma in our times of textual abundance: when is OCR quality good enough? User access, scientific results and the investment of limited resources increasingly depend on answering this question.

This lecture will give an update on ongoing work aimed at assessing the impact of varying OCR quality in KB collections, focusing on Dutch texts. Attendees will get an overview of the challenge, why it is relevant and how it can be addressed, as well as the results we have so far.

The talk will be given by Giovanni Colavizza and is based on his work as our Researcher in Residance. He is an assistant professor of digital humanities at the University of Amsterdam, a visiting researcher at The Alan Turing Institute (UK) and at the Center for Science and Technology Studies (CWTS, Leiden University).

Date and time

We will start at 13:00 on March the 18th and it will end around 13:30.

Registration

The lecture is public. To attend please register via https://attendee.gotowebinar.com/register/7183682722999279373.

18

Mar

For more information about Giovanni see his page on the KB lab or read the blogpost he wrote.

You are here