The IMPACT KB dataset was created for the purpose of evaluation and training of OCR software. The original OCR and layout recognition of a selection of KB material has been manually corrected to 99,95% accuracy to provide a 'perfect' result, otherwise also known as ground truth. The set consists of:
The set was made as part of the IMPACT project, a European funded research project led by the KB. From 2008-2012, 26 partners worked together to make OCR for historical text better, faster and cheaper. The project is concluded, but the resources and tools are transferred to the IMPACT Centre of Competence.