Workshop 'Mining Delpher Data' at DHBenelux

Introductie

KB Lab members Steven Claeyssens and Martijn Kleppe will organize a workshop during the DHBenelux conference together with researcher-in-residence Melvin Wevers and KB colleague Rene Voorburg.

Inhoudsblokken
Body

The title of the workshop is 'Mining Delpher Data' and will focus on accessing, cleaning and analysing KB Data. Details can be found below and at the website of the conference. To sign up for the workshop, please send an e-mail to the organisers.

Mining Delpher Data - Harvest, Clean and Analyse large amounts of digitised texts

When analysing sources of the National Library of the Netherlands (KB), researchers often use Delpher, the online gateway to more than 10 million pages of historical text (newspapers, books,journals & radio bulletins), mostly in Dutch. Delpher allows you to search and browse all documents in full text, making it a good resource for close reading. However, when you want to analyse large amounts of data to do distant reading, the KB allows researchers access to both the digital images, metadata, and full text in bulk via KB’s Dataservices & API’s, as well as additional data such as the Medieval Illuminated Manuscripts and the Dutch Digital Parliamentary Papers. To successfully harvest this data and subsequently clean and analyse it, you need knowledge about:

  1. the KB’s data formats and infrastructure,
  2. tools to clean the data and subsequently
  3. tools to analyse the data.

During this workshop, you will get a hands-on experience and guidance on all three steps. Experts of the KB (René Voorburg, Steven Claeyssens and Martijn Kleppe) will first guide you through KB’s metadata and available datasets. Then a PhD researcher of Utrecht University (Melvin Wevers) will show you which tools are available to clean the data and will assist you in making the first analyses.

During the first part of the workshop you will be guided through a number of exercises and all use the same dataset. During the second part you will be able to make a start with freely collecting and working with a selection of KB datasets that best fits your research interest, all under guidance of KB experts.

This workshop is aimed specifically at beginning users that have an interest in the KB Data. We assume no prior experience working with KB (meta)data nor any other significant technical knowledge or skills, such as programming skills, although basic computer skills are expected. The workshop will be in English. All data that we will work with, will be in Dutch.

Program

09.30 - 09.40 Opening - Martijn Kleppe (KB)

09.40 - 10.00 Datasets at the Koninklijke Bibliotheek - Steven Claeyssens (KB)

10.00 - 11.00 Harvest data at the KB - René Voorburg (KB)

11.00 - 11.15 Break

11.15 - 12.00 Harvest data with Jupyter Notebooks - Melvin Wevers (UU)

12.00 - 13.00 Lunch

13.00 - 13.45 Cleaning data - Melvin Wevers (UU)

13.45 - 14.30 Analysing data - Melvin Wevers (UU)

14.30 - 15.15 Hands-on

15.15 - 15.30 Closure - Martijn Kleppe (KB)

15.30 End workshop