Sparse embedded k-means clustering

Publication Type:
Conference Proceeding
Citation:
Advances in Neural Information Processing Systems, 2017, 2017-December pp. 3320 - 3328
Issue Date:
2017-01-01
Filename Description Size
6924-sparse-embedded-k-means-clustering.pdfPublished version366.74 kB
Adobe PDF
Full metadata record
© 2017 Neural information processing systems foundation. All rights reserved. The k-means clustering algorithm is a ubiquitous tool in data mining and machine learning that shows promising performance. However, its high computational cost has hindered its applications in broad domains. Researchers have successfully addressed these obstacles with dimensionality reduction methods. Recently, [1] develop a state-of-the-art random projection (RP) method for faster k-means clustering. Their method delivers many improvements over other dimensionality reduction methods. For example, compared to the advanced singular value decomposition based feature extraction approach, [1] reduce the running time by a factor of min{n, d}ϵ2log(d)/k for data matrix X ϵ ℝn×d with n data points and d features, while losing only a factor of one in approximation accuracy. Unfortunately, they still require O (ndk/ϵ2log (d) for matrix multiplication and this cost will be prohibitive for large values of n and d. To break this bottleneck, we carefully build a sparse embedded k-means clustering algorithm which requires O(nnz(X)) (nnz(X) denotes the number of non-zeros in X) for fast matrix multiplication. Moreover, our proposed algorithm improves on [1]'s results for approximation accuracy by a factor of one. Our empirical studies corroborate our theoretical findings, and demonstrate that our approach is able to significantly accelerate k-means clustering, while achieving satisfactory clustering performance.
Please use this identifier to cite or link to this item: