Sparse embedded k-means clustering

Liu, W; Shen, X; Tsang, IW

Sparse embedded k-means clustering

Liu, W Shen, X

Tsang, IW

Permalink

Publication Type:: Conference Proceeding
Citation:: Advances in Neural Information Processing Systems, 2017, 2017-December pp. 3320 - 3328
Issue Date:: 2017-01-01

Closed Access

	Filename	Description	Size
	6924-sparse-embedded-k-means-clustering.pdf	Published version	366.74 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, W	en_US
dc.contributor.author	Shen, X https://orcid.org/0000-0001-8494-4532	en_US
dc.contributor.author	Tsang, IW https://orcid.org/0000-0001-8095-4637	en_US
dc.date.issued	2017-01-01	en_US
dc.identifier.citation	Advances in Neural Information Processing Systems, 2017, 2017-December pp. 3320 - 3328	en_US
dc.identifier.issn	1049-5258	en_US
dc.identifier.uri	http://hdl.handle.net/10453/127063
dc.description.abstract	© 2017 Neural information processing systems foundation. All rights reserved. The k-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster k-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of min{n, d}ϵ2log(d)/k for data matrix X ϵ ℝn×d with n data points and d features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require O (ndk/ϵ2log (d) for matrix multiplication and this cost will be prohibitive for large values of n and d. To break this bottleneck, we carefully build a sparse embedded k-means clustering algorithm which requires O(nnz(X)) (nnz(X) denotes the number of non-zeros in X) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate k-means clustering, while achieving satisfactory clustering performance.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT130100746
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation	http://purl.org/au-research/grants/arc/DP180100106
dc.relation.ispartof	Advances in Neural Information Processing Systems	en_US
dc.title	Sparse embedded k-means clustering	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2017-December	en_US
utslib.for	1701 Psychology	en_US
utslib.for	1702 Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	2017-December	en_US

Abstract:

© 2017 Neural information processing systems foundation. All rights reserved. The k-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster k-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of min{n, d}ϵ2log(d)/k for data matrix X ϵ ℝn×d with n data points and d features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require O (ndk/ϵ2log (d) for matrix multiplication and this cost will be prohibitive for large values of n and d. To break this bottleneck, we carefully build a sparse embedded k-means clustering algorithm which requires O(nnz(X)) (nnz(X) denotes the number of non-zeros in X) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate k-means clustering, while achieving satisfactory clustering performance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127063