Infinite author topic model based on mixed gamma-negative binomial process

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
Proceedings - IEEE International Conference on Data Mining, ICDM, 2015, pp. 489 - 498
Issue Date:
2015
Full metadata record
Files in This Item:
Filename Description Size
Xuan-Lu-Zhang-ICDM-2015.pdfAccepted Manuscript version577.35 kB
Adobe PDF
Incorporating the side information of text corpus, i.e., authors, time stamps, and emotional tags, into the traditionaltext mining models has gained significant interests in the area of information retrieval, statistical natural language processing, andmachine learning. One branch of these works is the so-called Author Topic Model (ATM), which incorporates the authors'sinterests as side information into the classical topic model. However, the existing ATM needs to predefine the number of topics, which is difficult and inappropriate in many real-world settings. In this paper, we propose an Infinite Author Topic (IAT) modelto resolve this issue. Instead of assigning a discrete probability on fixed number of topics, we use a stochastic process to determinethe number of topics from the data itself. To be specific, we extend a gamma-negative binomial process to three levels in orderto capture the author-document-keyword hierarchical structure. Furthermore, each document is assigned a mixed gamma processthat accounts for the multi-author's contribution towards this document. An efficient Gibbs sampling inference algorithm witheach conditional distribution being closed-form is developed for the IAT model. Experiments on several real-world datasets showthe capabilities of our IAT model to learn the hidden topics, authors' interests on these topics and the number of topicssimultaneously.
Please use this identifier to cite or link to this item: