Hierarchical evolving Dirichlet processes for modeling nonlinear evolutionary traces in temporal data

Publication Type:
Journal Article
Citation:
Data Mining and Knowledge Discovery, 2017, 31 (1), pp. 32 - 64
Issue Date:
2017-01-01
Full metadata record
Files in This Item:
Filename Description Size
Wang2017_Article_HierarchicalEvolvingDirichletP.pdfPublished Version1.3 MB
Adobe PDF
© 2016, The Author(s). Clustering analysis aims to group a set of similar data objects into the same cluster. Topic models, which belong to the soft clustering methods, are powerful tools to discover latent clusters/topics behind large data sets. Due to the dynamic nature of temporal data, clusters often exhibit complicated patterns such as birth, branch and death. However, most existing temporal clustering models assume that clusters evolve as a linear chain, and they cannot model and detect branching of clusters. In this paper, we present evolving Dirichlet processes (EDP for short) to model nonlinear evolutionary traces behind temporal data, especially for temporal text collections. In the setting of EDP, temporal collections are divided into epochs. In order to model cluster branching over time, EDP allows each cluster in an epoch to form Dirichlet processes (DP) and uses a combination of the cluster-specific DPs as the prior for cluster distributions in the next epoch. To model hierarchical temporal data, such as online document collections, we propose a new class of evolving hierarchical Dirichlet processes (EHDP for short) which extends the hierarchical Dirichlet processes (HDP) to model evolving temporal data. We design an online learning framework based on Gibbs sampling to infer the evolutionary traces of clusters over time. In experiments, we validate that EDP and EHDP can capture nonlinear evolutionary traces of clusters on both synthetic and real-world text collections and achieve better results than its peers.
Please use this identifier to cite or link to this item: