Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Zhang, Y; Lu, J; Liu, F; Liu, Q; Porter, A; Chen, H; Zhang, G

Does deep learning help topic extraction? A kernel k-means clustering method with word embedding

Zhang, Y

Lu, J

Liu, F

Liu, Q Porter, A Chen, H

Zhang, G

Permalink

Publication Type:: Journal Article
Citation:: Journal of Informetrics, 2018, 12 (4), pp. 1099 - 1117
Issue Date:: 2018-11-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted manuscript VersionAdobe PDF (667.02 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Y https://orcid.org/0000-0002-7731-0301	en_US
dc.contributor.author	Lu, J https://orcid.org/0000-0003-0690-4732	en_US
dc.contributor.author	Liu, F https://orcid.org/0000-0002-5005-9129	en_US
dc.contributor.author	Liu, Q	en_US
dc.contributor.author	Porter, A	en_US
dc.contributor.author	Chen, H https://orcid.org/0000-0002-0893-1817	en_US
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-3960-0583	en_US
dc.date.issued	2018-11-01	en_US
dc.identifier.citation	Journal of Informetrics, 2018, 12 (4), pp. 1099 - 1117	en_US
dc.identifier.issn	1751-1577	en_US
dc.identifier.uri	http://hdl.handle.net/10453/129265
dc.description.abstract	© 2018 All rights reserved. Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP150101645
dc.relation.ispartof	Journal of Informetrics	en_US
dc.relation.isbasedon	10.1016/j.joi.2018.09.004	en_US
dc.subject.classification	Information & Library Sciences	en_US
dc.title	Does deep learning help topic extraction? A kernel k-means clustering method with word embedding	en_US
dc.type	Journal Article
utslib.citation.volume	4	en_US
utslib.citation.volume	12	en_US
utslib.for	0807 Library and Information Studies	en_US
utslib.for	0102 Applied Mathematics	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (Research)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	4	en_US
pubs.publication-status	Published	en_US
pubs.volume	12	en_US

Abstract:

© 2018 All rights reserved. Topic extraction presents challenges for the bibliometric community, and its performance still depends on human intervention and its practical areas. This paper proposes a novel kernel k-means clustering method incorporated with a word embedding model to create a solution that effectively extracts topics from bibliometric data. The experimental results of a comparison of this method with four clustering baselines (i.e., k-means, fuzzy c-means, principal component analysis, and topic models) on two bibliometric datasets demonstrate its effectiveness across either a relatively broad range of disciplines or a given domain. An empirical study on bibliometric topic extraction from articles published by three top-tier bibliometric journals between 2000 and 2017, supported by expert knowledge-based evaluations, provides supplemental evidence of the method's ability on topic extraction. Additionally, this empirical analysis reveals insights into both overlapping and diverse research interests among the three journals that would benefit journal publishers, editorial boards, and research communities.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/129265