Workshop: Using the Frame generator

Inhoudsblokken
Body

Exercise: Collect ANP OCR data

To be able to play with the Frame generator, collect OCR files from the ANP-collection.

  1. Define a basic jSRU search query that defines a matching subset for your research.
  2. Perform this query for various periods in time.
  3. Use the resolver to get the OCR (~ 3 per selected era).
    • Use ‘Save as…’ in your browser to store the result.
    • Note: Please apply the .xml-extension to the saved file!

About the ANP typoscript collection

The ANP dataset consists of about 1.5 million digitized typoscripts from radio news broadcasts between 1937 and 1984. Available through the Delpher website as Radiobulletins collection.

KB offers the data under (semi) open licenses:

  • CC0-license for the metadata
  • CC-BY-NC-ND-licenses for images and full-text objects

Cheatsheet

Using jSRU

Basic jSRU example queries

The base URL for the search API ishttp://jsru.kb.nl/sru/sru.

CQL query syntax

SRU uses CQL (Contextual Query Language), as its query language. Some examples:

Please note: Queries have to use properly encoded special characters. For example, a space should be replaced by %20 or + and a double quotation mark with %22. Most browsers will automatically take care of this encoding, but if you run into problems you can get your query encoded at the URL Encoding Reference. When entering double quotes, use straight quotes", not curly quotes“ ”.

ANP resolver links

Advanced techniques

Using jSRU faceting

ThemaximumRecordsparameter has been set to 0 here, so that only the facetted results are present in the response. Thex-facetprefixparameter can take values from 0 to 3, resulting in different temporal resolutions of the facet (where 0=decade, 1=year, 2=month, and 3=day). Thex-facetnameandx-facetsparameters are needed to indicate the particular facet requested.

Retrieving metadata from KB-MDO

Some tips for further exploring:

A simple tool for harvesting metadata from MDO or any other OAI-PMH repository that runs on both Macs and Linux is oai2linerec. Try for example./oai2linerec.sh -s anp -p didl -b http://services.kb.nl/mdo/oai -o output.txt.

A convenient Python based wrapper for accessing both jSRU and KB-MDO can be found at http://lab.kb.nl/tool/python-api