Clustering high-dimensional data with low-order neighbors

Zhao, Y; Zhang, C; Shen, YD

Clustering high-dimensional data with low-order neighbors

Zhao, Y Zhang, C

Shen, YD

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004, 2004, pp. 103 - 109
Issue Date:: 2004-12-01

Closed Access

	Filename	Description	Size
	2004001583.pdf		439.3 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, Y	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.contributor.author	Shen, YD	en_US
dc.date.issued	2004-12-01	en_US
dc.identifier.citation	Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004, 2004, pp. 103 - 109	en_US
dc.identifier.isbn	0769521002	en_US
dc.identifier.uri	http://hdl.handle.net/10453/2695
dc.description.abstract	Density-based and grid-based clustering are two main clustering approaches. The former is famous for its capability of discovering clusters of various shapes and eliminating noises, while the latter is well known for its high speed. Combination of the two approaches seems to provide better clustering results. To the best of our knowledge, however, all existing algorithms that combine density-based clustering and grid-based clustering take cells as atomic units, in the sense that either all objects in a cell belong to a cluster or no object in the cell belong to any cluster. This requires the cells to be small enough to ensure the fine resolution of results. In high-dimensional spaces, however, the number of cells can be very large when cells are small, which would make the clustering process extremely costly. On the other hand, the number of neighbors of a cell grows exponentially with the dimensionality of datasets, which makes the complexity increase further. In this paper, we present a new approach that takes objects (or points) as the atomic units, so that the restriction of cell size can be relaxed without degrading the resolution of clustering results. In addition, a concept of ith-order neighbors is introduced to avoid considering the exponential number of neighboring cells. By considering only low-order neighbors, our algorithm is very efficient while losing only a little bit of accuracy. Experiments on synthetic and public data show that our algorithm can cluster high-dimensional data effectively and efficiently. © 2004 IEEE.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP0449535
dc.relation.ispartof	Proceedings - IEEE/WIC/ACM International Conference on Web Intelligence, WI 2004	en_US
dc.title	Clustering high-dimensional data with low-order neighbors	en_US
dc.type	Conference Proceeding
utslib.for	080109 Pattern Recognition and Data Mining	en_US
dc.location.activity	Beijing, China	en_US
dc.location.activity	Las Vegas, USA
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Density-based and grid-based clustering are two main clustering approaches. The former is famous for its capability of discovering clusters of various shapes and eliminating noises, while the latter is well known for its high speed. Combination of the two approaches seems to provide better clustering results. To the best of our knowledge, however, all existing algorithms that combine density-based clustering and grid-based clustering take cells as atomic units, in the sense that either all objects in a cell belong to a cluster or no object in the cell belong to any cluster. This requires the cells to be small enough to ensure the fine resolution of results. In high-dimensional spaces, however, the number of cells can be very large when cells are small, which would make the clustering process extremely costly. On the other hand, the number of neighbors of a cell grows exponentially with the dimensionality of datasets, which makes the complexity increase further. In this paper, we present a new approach that takes objects (or points) as the atomic units, so that the restriction of cell size can be relaxed without degrading the resolution of clustering results. In addition, a concept of ith-order neighbors is introduced to avoid considering the exponential number of neighboring cells. By considering only low-order neighbors, our algorithm is very efficient while losing only a little bit of accuracy. Experiments on synthetic and public data show that our algorithm can cluster high-dimensional data effectively and efficiently. © 2004 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/2695