Concept-based topic model improvement

Publisher:
Springer Berlin Heidelberg
Publication Type:
Conference Proceeding
Citation:
Studies in Computational Intelligence, 2011, 369, pp. 133-142
Issue Date:
2011-10-24
Full metadata record
We propose a system which employs conceptual knowledge to improve topic models by removing unrelated words from the simplified topic description. We use WordNet to detect which topical words are not conceptually similar to the others and then test our assumptions against human judgment. Results obtained on two different corpora in different test conditions show that the words detected as unrelated had a much greater probability than the others to be chosen by human evaluators as not being part of the topic at all. We prove that there is a strong correlation between the said probability and an automatically calculated topical fitness and we discuss the variation of the correlation depending on the method and data used. © 2011 Springer-Verlag Berlin Heidelberg.
Please use this identifier to cite or link to this item: