A semi-supervised Hidden Markov topic model based on prior knowledge

Seifollahi, S; Piccardi, M; Borzeshi, EZ

A semi-supervised Hidden Markov topic model based on prior knowledge

Seifollahi, S

Piccardi, M

Borzeshi, EZ

Permalink

Publication Type:: Conference Proceeding
Citation:: Communications in Computer and Information Science, 2018, 845 pp. 265 - 276
Issue Date:: 2018-01-01

Closed Access

	Filename	Description	Size
	SHMTM.pdf	Accepted Manuscript version	528.62 kB	Adobe PDF	View/Open
	AusDM 2017 Submission 23.txt	Accepted Manuscript version	9.13 kB	Text	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Seifollahi, S https://orcid.org/0000-0002-5325-9724	en_US
dc.contributor.author	Piccardi, M https://orcid.org/0000-0001-9250-6604	en_US
dc.contributor.author	Borzeshi, EZ	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	Communications in Computer and Information Science, 2018, 845 pp. 265 - 276	en_US
dc.identifier.isbn	9789811302916	en_US
dc.identifier.issn	1865-0929	en_US
dc.identifier.uri	http://hdl.handle.net/10453/117685
dc.description.abstract	© Springer Nature Singapore Pte Ltd. 2018. A topic model is an unsupervised model to automatically discover the topics discussed in a collection of documents. Most of the existing topic models only use bag-of-words representations or single-word distributions and do not consider relations between words in the model. As a consequence, these models may generate topics which are not in good agreement with human-judged topic coherence. To mitigate this issue, we present a topic model which employs topically-related knowledge from prior topics and words’ co-occurrence/relations in the collection. To incorporate the prior knowledge, we leverage a two-staged semi-supervised Markov topic model. In the first stage, we estimate a transition matrix and a low-dimensional vocabulary for the final topic model. In the second stage, we produce the final topic model where the topic assignment is performed following a Markov chain process. Experiments on real text documents from a major compensation agency demonstrate improvements of both the PMI score measure and the topic coherence.	en_US
dc.relation.ispartof	Communications in Computer and Information Science	en_US
dc.relation.isbasedon	10.1007/978-981-13-0292-3_17	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A semi-supervised Hidden Markov topic model based on prior knowledge	en_US
dc.type	Conference Proceeding
utslib.citation.volume	845	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US
pubs.volume	845	en_US

Abstract:

© Springer Nature Singapore Pte Ltd. 2018. A topic model is an unsupervised model to automatically discover the topics discussed in a collection of documents. Most of the existing topic models only use bag-of-words representations or single-word distributions and do not consider relations between words in the model. As a consequence, these models may generate topics which are not in good agreement with human-judged topic coherence. To mitigate this issue, we present a topic model which employs topically-related knowledge from prior topics and words’ co-occurrence/relations in the collection. To incorporate the prior knowledge, we leverage a two-staged semi-supervised Markov topic model. In the first stage, we estimate a transition matrix and a low-dimensional vocabulary for the final topic model. In the second stage, we produce the final topic model where the topic assignment is performed following a Markov chain process. Experiments on real text documents from a major compensation agency demonstrate improvements of both the PMI score measure and the topic coherence.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/117685