Former researcher-in-residence dr. Frank Harbers will be presenting a virtual paper at DH2017 on the research he and Juliette Lonij did during his time at the KB Lab, which resulted in the Genre Classifier.
Abstract
This paper examines the opportunities, approaches and issues of automatically classifying historical newspaper articles from the Netherlands for ‘genre’ as an expression of the historically and culturally determined conception of journalism. Ultimately, it offers an outline of a concrete machine learning approach, applying linear and non-linear classifiers, to predict the genre of a newspaper article. As a part of this, the paper discusses the different tools we have tried out and the problems we have encountered in the process. Specifically, the paper reflects on the way the rule-based approach to determining genre in the manual content analysis relates to the training of an automatic classifier based on machine learning techniques.