Code created during KB Research in Residence project "Why girls smile and boys don't cry". This repository provides tools for training and fine-tuning word embedding models (Word2Vec and FastText) on a selected subset of Dutch Newspapers available in Delpher.
It comes with various functions to explore the trained embeddings. Lexicon expansion, allows you to "travel through a vector space" and interactively create a lexicon of conceptually related words in the process. In the Bias folder, you find various tools for analysing bias over time and other dimensions such as political leaning and place. You can for example inspect how bias changes over time, comparing the evolution for different facets. Besides these timelines, you can zoom in on a specific year, and inspect the words that drive these differences, by either plotting the distribution of bias scores by facet, category and rank words by their bias scores. The notebook gives an overview of all function for analysing bias in word embedding trained on the Delpher newspapers.