ALTO Edit is a simple browser-based post correction tool for ALTO XML files. The tool was developed in 2012 for an OCR post-correction project and uses the so called side-by-side method, where the user has the scan on the left side of the screen and the OCR on the right.
The tool contains a search-and-replace function to automatically correct similar errors throughout the text and small segmentation errors can be corrected in a separate window that becomes available when typing a space or emptying a field.
The tool and installation instructions are available on Github.
Update 16/03/2022 - the live demo page has been updated with a new version of ALTO Edit (2.0.). It also has a new Github repository. The old ALTO Edit demo no longer works.
When using the ALTO Edit, we request you to cite it as follows:
Ark, R. van der, ALTO Edit (2012). KB Lab: The Hague http://lab.kb.nl/tool/alto-edit
ALTO Edit is a relatively simple tool to use for OCR correction. Once a text is uploaded, each page is displayed separately in the left side of the tool. On the right side, you see the OCR text divided into lines and words. Click on line to start correcting. The image on the left automatically zooms into the line you selected. You can change the view of the image by clicking '+' and '-' on the top left of the tool. The '[=]'-button zooms out the show the whole page again.
Correct the words that are misrecognised. If you see a segmentation error, simply empty the field for words that are split up, or type a space for words that are joined. Once you close the edit-field by pressing 'Enter', three red exclamation points appear next to the line. Click on them to open the segmentation-window.
An extra button appears in the text of the line under the word you want to correct. Click on the button and place the black marker where you want to split the word. When you have emptied a field, the field appears in red. Click on the field and then click 'Verwijder dit segment', to remove the whole field. Do remember to type the text into the field where it belongs! Once you have completed a page, click 'Opslaan' to go to the next page.
ALTO Edit was used in a pilot on OCR Post-correction at the KB. The full report is available via the IMPACT Centre of Competence.