Compressed K - Means for large-scale clustering

Shen, X; Liu, W; Tsang, I; Shen, F; Sun, QS

Compressed K - Means for large-scale clustering

Shen, X

Liu, W Tsang, I

Shen, F Sun, QS

Permalink

Publication Type:: Conference Proceeding
Citation:: 31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 2527 - 2533
Issue Date:: 2017-01-01

Closed Access

	Filename	Description	Size
	AAAI17.pdf	Published version	1.8 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Shen, X https://orcid.org/0000-0001-8494-4532	en_US
dc.contributor.author	Liu, W	en_US
dc.contributor.author	Tsang, I https://orcid.org/0000-0001-8095-4637	en_US
dc.contributor.author	Shen, F	en_US
dc.contributor.author	Sun, QS	en_US
dc.date.issued	2017-01-01	en_US
dc.identifier.citation	31st AAAI Conference on Artificial Intelligence, AAAI 2017, 2017, pp. 2527 - 2533	en_US
dc.identifier.uri	http://hdl.handle.net/10453/105908
dc.description.abstract	Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.	en_US
dc.relation.ispartof	31st AAAI Conference on Artificial Intelligence, AAAI 2017	en_US
dc.title	Compressed K - Means for large-scale clustering	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Copyright © 2017, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Large-scale clustering has been widely used in many applications, and has received much attention. Most existing clustering methods suffer from both expensive computation and memory costs when applied to large-scale datasets. In this paper, we propose a novel clustering method, dubbed compressed k-means (CKM), for fast large-scale clustering. Specifically, high-dimensional data are compressed into short binary codes, which are well suited for fast clustering. CKM enjoys two key benefits: 1) storage can be significantly reduced by representing data points as binary codes; 2) distance computation is very efficient using Hamming metric between binary codes. We propose to jointly learn binary codes and clusters within one framework. Extensive experimental results on four large-scale datasets, including two million-scale datasets demonstrate that CKM outperforms the state-of-theart large-scale clustering methods in terms of both computation and memory cost, while achieving comparable clustering accuracy.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/105908