The CHRONIC (Classified Historical Newspaper Images) dataset consists of metadata for 313K classified images harvested from Delpher’s Dutch digitised newspapers for the period 1860-1922. Thomas Smits and Willem-Jan Faber created the CHRONIC database by applying several Computer Vision techniques to classify the images. CHRONIC was originally created to test if state of the art computer vision techniques could be applied to historical images and research when Dutch newspapers started to use photographs instead of drawings to visually represent the news .
We used a pipeline, consisting of four different steps, to classify the images: a harvester, a face recognition classifier, a classification into nine different categories using Tensorflow’s Inception-V3 convolutional neural network (CNN), and a classification of all the images into photographs and drawings by a convolutional neural network made by Leonardo Impett.
In the first step of the pipeline, we harvested images from digitised Dutch newspapers. In the XML files (ALTO) of the pages of the digitised newspaper, the code-line ‘imageblock’ denotes images. Around 1900, Dutch newspapers contained many small images, like the often-recurring illustrations used at the beginning of a specific section, or small images that accompanied advertisements in newspapers. Because we were mainly interested in images of the news, we decided to only include images that could be related to newspaper articles (via the XML file), exclude images of advertisements, and discard all the images with a file size smaller than 30KB.
Faces and categories
In the second step, we used Adam Geitgey’s facial recognition API, built using the Dlib’s facial recognition library, to recognize faces on the images . In the third step, we applied Tensorflow’s Inception-V3 convolutional neural network to recognize nine different categories:
- sheet music, and
- weather reports
In the last few years, machine learning has made tremendous progress in object detection and classification. Deep convolutional neural networks can especially achieve high performance in these kinds of tasks. Inception-V3 is a deep convolutional neural network trained for the ImageNet Large Visual Recognition Challenge. In order to recognize our nine categories, we retrained Inception’s final layers . Although the creators of this method recognize that it will be outperformed by a full training run, it is surprisingly effective (see below for performance) and does not require GPU hardware . We used training sets of around forty images for every category.
We asked Leonardo Impett to build a CNN that could recognize if images were either drawings or photographs. Although this task sounds relatively simple, the heterogeneity of the material makes it quite hard. Building on the work of Paul Fyfe and Qian Ge, we decided to focus on reproduction techniques: engraving for illustrations and the half-tone process for photographs. Using MATLAB, Ge devised a method to analyse two low-level features of images: the pixel ratio, the number of low-intensity pixels divided by the total number of pixels, and the entropy level: the amount of information contained in the image. By juxtaposing these two features, they were able to sort the images of the illustrated newspapers according to the technique used for their reproduction. Half-tones, used to reproduce photographs, exhibit both a high pixel ratio and a high entropy level, while engravings, used to reproduce illustrations, display lower pixel ratios and entropy level. Applying the technique of Fyfe and Ge, we found that it was relatively good in recognizing both high-quality engravings and photographs. However, Dutch newspapers mainly printed low-quality halftones and engravings, which were not recognized by their model. Furthermore, newspapers frequently used the half-tone technique to reproduce illustrations.
Impett built a CNN that focuses on the lower-layers of the network and trained a support vector machine (SVM) to divide the images into photographs and illustrations. This method is based on the same idea as Fyfe and Ge’s method, but uses the lower layers of a CNN instead of pixel ratio’s and entropy levels.
In order to calculate the F1-scores of the applied computer vision techniques we manually tagged 500 random images from the entire dataset (1860-1922), 500 random images of the years before 1900 (1860-1900), and 500 images for the years after 1900 (1900-1922). With an F1-score of around 0,85 Impett’s CNN can be confidently used to recognize photographs in digitized visual source material of this period. The high scores of the chess and weather categories show that Inception is very good in recognizing images with a high degree of visual similarity. Although it has more trouble with conceptual similarity, the F1scores for ‘maps,’ ‘buildings,’ and ‘crowds’ show that this method can also be used for these kinds of tasks. The same thing goes, albeit to a lesser extent, for the category ‘cartoon’ which records if images are stylistically similar.
|Category||F1-score (1860 - 1900)||F1-score (1900-1922)||F1-score (1860-1922)|
 Adam Geitgey, Face_recognition: The World’s Simplest Facial Recognition Api for Python and the Command Line, Python, 2017, https://github.com/ageitgey/face_recognition.
 Jeff Donahue et al., “DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition,” ArXiv:1310.1531 [Cs], October 5, 2013, http://arxiv.org/abs/1310.1531; “How to Retrain Inception’s Final Layer for New Categories,” TensorFlow, accessed November 23, 2017, https://www.tensorflow.org/tutorials/image_retraining.
 “How to Retrain Inception’s Final Layer for New Categories.”