Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes

Xuan, J; Lu, J; Zhang, G; Xu, RYD; Luo, X

Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes

Xuan, J

Lu, J

Zhang, G

Xu, RYD Luo, X

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (7), pp. 1357 - 1369
Issue Date:: 2017-07-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (541.46 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xuan, J https://orcid.org/0000-0002-8367-6908	en_US
dc.contributor.author	Lu, J https://orcid.org/0000-0003-0690-4732	en_US
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-3960-0583	en_US
dc.contributor.author	Xu, RYD	en_US
dc.contributor.author	Luo, X	en_US
dc.date.issued	2017-07-01	en_US
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2017, 29 (7), pp. 1357 - 1369	en_US
dc.identifier.issn	1041-4347	en_US
dc.identifier.uri	http://hdl.handle.net/10453/100173
dc.description.abstract	© 2016 IEEE. Traditional relational topic models provide a successful way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, and link prediction, could benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known a priori, which is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model using stochastic processes instead of fixed-dimensional probability distributions in this paper. Specifically, each document is assigned a Gamma process, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different Gamma process from the global Gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP140101366
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.relation.isbasedon	10.1109/TKDE.2016.2636182	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Bayesian Nonparametric Relational Topic Model through Dependent Gamma Processes	en_US
dc.type	Journal Article
utslib.citation.volume	7	en_US
utslib.citation.volume	29	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
utslib.copyright.status	open_access
pubs.issue	7	en_US
pubs.publication-status	Published	en_US
pubs.volume	29	en_US

Abstract:

© 2016 IEEE. Traditional relational topic models provide a successful way to discover the hidden topics from a document network. Many theoretical and practical tasks, such as dimensional reduction, document clustering, and link prediction, could benefit from this revealed knowledge. However, existing relational topic models are based on an assumption that the number of hidden topics is known a priori, which is impractical in many real-world applications. Therefore, in order to relax this assumption, we propose a nonparametric relational topic model using stochastic processes instead of fixed-dimensional probability distributions in this paper. Specifically, each document is assigned a Gamma process, which represents the topic interest of this document. Although this method provides an elegant solution, it brings additional challenges when mathematically modeling the inherent network structure of typical document network, i.e., two spatially closer documents tend to have more similar topics. Furthermore, we require that the topics are shared by all the documents. In order to resolve these challenges, we use a subsampling strategy to assign each document a different Gamma process from the global Gamma process, and the subsampling probabilities of documents are assigned with a Markov Random Field constraint that inherits the document network structure. Through the designed posterior inference algorithm, we can discover the hidden topics and its number simultaneously. Experimental results on both synthetic and real-world network datasets demonstrate the capabilities of learning the hidden topics and, more importantly, the number of topics.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/100173