Semi-supervised Variable Weighting for Clustering

SIAM / Omnipress
Publication Type:
Conference Proceeding
Proceedings of the Eleventh SIAM International Conference on Data Mining, 2011, pp. 863 - 871
Issue Date:
Full metadata record
Files in This Item:
Filename Description SizeFormat
2010005231OK.pdf393.53 kBAdobe PDF
Semi-supervised learning, which uses a small amount of labeled data in conjunction with a large amount of unlabeled data for training, has recently attracted huge research attention due to the considerable improvement in learning accuracy. In this work, we focus on semi- supervised variable weighting for clustering, which is a critical step in clustering as it is known that interesting clustering structure usually occurs in a subspace defined by a subset of variables. Besides exploiting both labeled and unlabeled data to effectively identify the real importance of variables, our method embeds variable weighting in the process of semi-supervised clustering, rather than calculating variable weights separately, to ensure the computation efficiency. Our experiments carried out on both synthetic and real data demonstrate that semi-supervised variable weighting signicantly improves the clustering accuracy of existing semi-supervised k-means without variable weighting, or with unsupervised variable weighting.
Please use this identifier to cite or link to this item: