A semi-supervised Hidden Markov topic model based on prior knowledge
- Publication Type:
- Conference Proceeding
- Communications in Computer and Information Science, 2018, 845 pp. 265 - 276
- Issue Date:
© Springer Nature Singapore Pte Ltd. 2018. A topic model is an unsupervised model to automatically discover the topics discussed in a collection of documents. Most of the existing topic models only use bag-of-words representations or single-word distributions and do not consider relations between words in the model. As a consequence, these models may generate topics which are not in good agreement with human-judged topic coherence. To mitigate this issue, we present a topic model which employs topically-related knowledge from prior topics and words’ co-occurrence/relations in the collection. To incorporate the prior knowledge, we leverage a two-staged semi-supervised Markov topic model. In the first stage, we estimate a transition matrix and a low-dimensional vocabulary for the final topic model. In the second stage, we produce the final topic model where the topic assignment is performed following a Markov chain process. Experiments on real text documents from a major compensation agency demonstrate improvements of both the PMI score measure and the topic coherence.
Please use this identifier to cite or link to this item: