Dirichlet mixture allocation for multiclass document collections modeling

Publication Type:
Conference Proceeding
Citation:
Proceedings - IEEE International Conference on Data Mining, ICDM, 2009, pp. 711 - 715
Issue Date:
2009-12-01
Filename Description Size
Thumbnail2011001274OK.pdf187.77 kB
Adobe PDF
Full metadata record
Topic model, Latent Dirichlet Allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes. © 2009 IEEE.
Please use this identifier to cite or link to this item: