Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation

Xuan, J; Lu, J; Zhang, G; Luo, X

Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation

Xuan, J

Lu, J

Zhang, G

Luo, X

Permalink

Publication Type:: Conference Proceeding
Citation:: Advances in Intelligent Systems and Computing, 2014, 277 pp. 83 - 92
Issue Date:: 2014-01-01

Closed Access

	Filename	Description	Size
	ISKE.pdf	Accepted Manuscript version	469.08 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xuan, J https://orcid.org/0000-0002-8367-6908	en_US
dc.contributor.author	Lu, J https://orcid.org/0000-0003-0690-4732	en_US
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-3960-0583	en_US
dc.contributor.author	Luo, X	en_US
dc.date.issued	2014-01-01	en_US
dc.identifier.citation	Advances in Intelligent Systems and Computing, 2014, 277 pp. 83 - 92	en_US
dc.identifier.isbn	9783642549236	en_US
dc.identifier.issn	2194-5357	en_US
dc.identifier.uri	http://hdl.handle.net/10453/35660
dc.description.abstract	Based on vector-based representation, topic models, like latent Dirichlet allocation (LDA), are constructed for documents with 'bag-of-words' assumption. They can discover the distribution of underlying topics in a document and the distribution of keywords in a topic, which have been proved very successful and practical in many scenarios, recently. Comparing vector-based representation of documents, graph-based representation method can preserve more semantics of documents, because not only keywords but also the relations between them in documents are considered. In this paper, a topic model for graph-represented documents (GTM) is proposed. In this model, a Bernoulli distribution is used to model the formation of the edge between two keywords in a document. The experimental results show that GTM outperforms LDA in document classification task using the unveiled topics from these two models to represent documents. © Springer-Verlag Berlin Heidelberg 2014.	en_US
dc.relation.ispartof	Advances in Intelligent Systems and Computing	en_US
dc.relation.isbasedon	10.1007/978-3-642-54924-3_8	en_US
dc.title	Release 'Bag-of-Words' Assumption of Latent Dirichlet Allocation	en_US
dc.type	Conference Proceeding
utslib.citation.volume	277	en_US
utslib.for	080105 Expert Systems	en_US
utslib.for	080108 Neural, Evolutionary and Fuzzy Computation	en_US
utslib.for	080605 Decision Support and Group Support Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	277	en_US

Abstract:

Based on vector-based representation, topic models, like latent Dirichlet allocation (LDA), are constructed for documents with 'bag-of-words' assumption. They can discover the distribution of underlying topics in a document and the distribution of keywords in a topic, which have been proved very successful and practical in many scenarios, recently. Comparing vector-based representation of documents, graph-based representation method can preserve more semantics of documents, because not only keywords but also the relations between them in documents are considered. In this paper, a topic model for graph-represented documents (GTM) is proposed. In this model, a Bernoulli distribution is used to model the formation of the edge between two keywords in a document. The experimental results show that GTM outperforms LDA in document classification task using the unveiled topics from these two models to represent documents. © Springer-Verlag Berlin Heidelberg 2014.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/35660