RDA Entity Finder
Find Works, Expressions and Manifestations of novels in the Dutch National Bibliography
The RDA Entity Finder enables you to browse through the bibliographic Work, Expression, Manifestation and Item entities. These are the so-called WEMI-entities of the IFLA Library Reference Model (LRM).
The WEMI entity framework is constructed on top of the ‘flat’ traditional bibliographic records. This required an extensive transformation of the bibliographic metadata.
First the bibliographic metadata in the native PICA-format was imported into a Postgres database. PICA subfields were transformed to RDA-URI’s, combining data from the RDA Registry with internal mappings and stored procedures to cope with all the data conditions and exceptions.
The second step was the construction of “Authorized Access Points” (AAP’s). In each bibliographic record AAP’s were constructed for each of the WEMI-entities. These AAP’s functioned as the entities fingerprints.
The next step was to cluster, for each WEMI-entity, all the records with the same fingerprint (AAP). In this clustering we used the Levenshtein algoritm for better results.
The results of the WEMI-clustering highly depends on the quality of the bibliographic metadata, so the transformation from PICA to RDA was preceded by data-analysis and data improvements. A “Metadata infrastructure” was developed to support a cyclic process of continuous improvements.
In fact the WEMI structures offer new ways to analyze and improve the source metadata.
So the construction of the WEMI framework for the novels is still work in progress. Both the data and the transformation procedures are evolving. For this reason the data of the RDA Entity Finder will periodically be refreshed.
We hope to publish Work URI’s in our Linked Data environment next year. URI design and persistency are still issues to be considered.
The transformed dataset consists of novels in the Dutch National Bibliography.
These novels were supposed to be a relatively easy dataset to transform.
164.394 Bibliographic records
Are transformed to: