Dirichlet mixture allocation for multiclass document collections modeling

Publication Type:
Conference Proceeding
Citation:
Proceedings - IEEE International Conference on Data Mining, ICDM, 2009, pp. 711 - 715
Issue Date:
2009-12-01
Metrics:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2011001274OK.pdf187.77 kB
Adobe PDF
Topic model, Latent Dirichlet Allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes. © 2009 IEEE.
Please use this identifier to cite or link to this item: