The Keyword Generator is a command line tool written in Python that can be dowloaded from GitHub.
- To run the Keyword Generator, Python 2.7 should be installed
- Gensim is needed for topic modelling and tf-idf score calculation and should be installed as well
- Mallet has to be installed in order to use the Mallet topic modelling option
Installing the Keyword Generator only requires unpacking of the zip-file or downloading the source code from GitHub. This results in the creation of a folder ‘keyword-generator’, in which three files (corpus.py, keywords_lda.py, keywords_tfidf.py) and one subfolder ‘data’ appear. The ‘data’ subfolder in turn contains three subfolders of its own: ‘documents’, ‘models’, and ‘stop_words’. The user can put his own stop word lists in the last of these folders, dependent on whether or not he wants to leave stop words out of the equation. In general, the Keyword Generator is not language specific, but, obviously, the use of stop words is. The (collections of) text(s) from which the Keyword Generator will derive its keyword list can be put in the ‘documents’ folder. The input should consist of one or more plain text files (.txt extension, UTF-8 encoded).
Once installed, the Keyword Generator can be started by entering 'python keywords_lda.py' or 'python keywords_tfidf.py' at the command line from within the ‘keyword-generator’ folder. A very elaborate instruction with an explanation of all available options was written by dr. Pim Huijnen on the KB Research Blog. A brief overview with some example commands can be found on GitHub.