Efficient Mining of Distance-Based Subspace Clusters

DSpace/Manakin Repository

Search OPUS

Advanced Search


My Account

Show simple item record

dc.contributor.author Liu, G
dc.contributor.author Sim, K
dc.contributor.author Li, J
dc.contributor.author Wong, L
dc.date.accessioned 2012-02-02T04:21:36Z
dc.date.issued 2009-01
dc.identifier.citation Statistical Analysis and Data Mining, 2009, 2 (5-6), pp. 427 - 444
dc.identifier.issn 1932-1864
dc.identifier.other C1UNSUBMIT en_US
dc.identifier.uri http://hdl.handle.net/10453/14517
dc.description.abstract Traditional similarity measurements often become meaningless when dimensions of datasets increase. Subspace clustering has been proposed to find clusters embedded in subspaces of high-dimensional datasets. Many existing algorithms use a grid-based approach to partition the data space into nonoverlapping rectangle cells, and then identify connected dense cells as clusters. The rigid boundaries of the grid-based approach may cause a real cluster to be divided into several small clusters. In this paper, we propose to use a sliding-window approach to partition the dimensions to preserve significant clusters. We call this model nCluster model. The sliding-window approach generates more bins than the grid-based approach, thus it incurs higher mining cost. We develop a deterministic algorithm, called MaxnCluster, to mine nClusters efficiently. MaxnCluster uses several techniques to speed up the mining, and it produces only maximal nClusters to reduce result size. Non-maximal nClusters are pruned without the need of storing the discovered nClusters in the memory, which is key to the efficiency of MaxnCluster. Our experiment results show that (i) the nCluster model can indeed preserve clusters that are shattered by the grid-based approach on synthetic datasets; (ii) the nCluster model produces more significant clusters than the grid-based approach on two real gene expression datasets and (iii) MaxnCluster is efficient in mining maximal nClusters.
dc.publisher John Wiley and Sons Inc
dc.relation.hasversion Accepted manuscript version en_US
dc.relation.isbasedon 10.1002/sam.10062
dc.title Efficient Mining of Distance-Based Subspace Clusters
dc.type Journal Article
dc.parent Statistical Analysis and Data Mining
dc.journal.volume 5-6
dc.journal.volume 2
dc.journal.number 5-6 en_US
dc.publocation United States en_US
dc.identifier.startpage 427 en_US
dc.identifier.endpage 444 en_US
dc.cauo.name FEIT.Faculty of Engineering & Information Technology en_US
dc.conference Verified OK en_US
dc.for 0104 Statistics
dc.personcode 112261
dc.percentage 100 en_US
dc.classification.name Statistics en_US
dc.classification.type FOR-08 en_US
dc.edition en_US
dc.custom en_US
dc.date.activity en_US
dc.location.activity en_US
dc.description.keywords subspace clustering
dc.description.keywords distance-based clustering
dc.description.keywords biclustering
pubs.embargo.period Not known
pubs.organisational-group /University of Technology Sydney
pubs.organisational-group /University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group /University of Technology Sydney/Strength - Health Technologies
utslib.copyright.status Open Access
utslib.copyright.date 2015-04-15 12:23:47.074767+10
pubs.consider-herdc false
utslib.collection.history General (ID: 2)
utslib.collection.history Uncategorised (ID: 363)

Files in this item

This item appears in the following Collection(s)

Show simple item record