Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation

Publication Type:
Conference Proceeding
Citation:
Advances in Intelligent Systems and Computing, 2014, 277 pp. 83 - 92
Issue Date:
2014-01-01
Filename Description Size
ThumbnailISKE.pdfAccepted Manuscript version469.08 kB
Adobe PDF
Full metadata record
Based on vector-based representation, topic models, like latent Dirichlet allocation (LDA), are constructed for documents with 'bag-of-words' assumption. They can discover the distribution of underlying topics in a document and the distribution of keywords in a topic, which have been proved very successful and practical in many scenarios, recently. Comparing vector-based representation of documents, graph-based representation method can preserve more semantics of documents, because not only keywords but also the relations between them in documents are considered. In this paper, a topic model for graph-represented documents (GTM) is proposed. In this model, a Bernoulli distribution is used to model the formation of the edge between two keywords in a document. The experimental results show that GTM outperforms LDA in document classification task using the unveiled topics from these two models to represent documents. © Springer-Verlag Berlin Heidelberg 2014.
Please use this identifier to cite or link to this item: