We have designed DIGGER in order to study the evolution of the Dutch urban system by investigating information flows extracted from historical newspapers that go back to 1869. Newspapers are full of geographical information as most of news items include one or more place names. However, studying this geographical information systematically is not an easy task. In this project, we have geocoded place names contained in a selection of 102 million news items by developing a method coupling fast SRU queries and Named-Entity Recognition. The resulting dataset allow to create origin-destination matrices representing information flows between cities for more than one century.
The detailed description of the data collection can be found in the data paper section of the journal Cybergeo.
DIGGER was created while Antoine Peris was researcher-in-residence at the KB. During the data collection, Antoine was assisted by Willem Jan Faber from the Research Department of the KB.
The authors would like to thank Martijn Kleppe, Lotte Wilms and Steven Claeyssens for their availability and support during and after the creation of the dataset. Moreover, we thank also Evert Meijers and Maarten van Ham for their enthusiasm and all the fruitful discussion that have been very important the design of the data collection.
- Peris, A., Faber, W.J., Meijers, E., van Ham, M., 2020. One century of information diffusion in the Netherlands derived from a massive digital archive of historical newspapers: the DIGGER dataset. Cybergeo : European Journal of Geography. https://doi.org/10.4000/cybergeo.33747
- Peris, A., Meijers, E., van Ham, M., 2020. Information diffusion between cities: Revisiting Zipf and Pred with a computational social science approach, [under review]