Dirichlet mixture allocation for multiclass document collections modeling
- Publication Type:
- Conference Proceeding
- Proceedings - IEEE International Conference on Data Mining, ICDM, 2009, pp. 711 - 715
- Issue Date:
Topic model, Latent Dirichlet Allocation (LDA), is an effective tool for statistical analysis of large collections of documents. In LDA, each document is modeled as a mixture of topics and the topic proportions are generated from the unimodal Dirichlet distribution prior. When a collection of documents are drawn from multiple classes, this unimodal prior is insufficient for data fitting. To solve this problem, we exploit the multimodal Dirichlet mixture prior, and propose the Dirichlet mixture allocation (DMA). We report experiments on the popular TDT2 Corpus demonstrating that DMA models a collection of documents more precisely than LDA when the documents are obtained from multiple classes. © 2009 IEEE.
Please use this identifier to cite or link to this item: