Semi-supervised variable weighting for clustering

Publication Type:
Conference Proceeding
Citation:
Proceedings of the 11th SIAM International Conference on Data Mining, SDM 2011, 2011, pp. 862 - 871
Issue Date:
2011-12-01
Filename Description Size
Thumbnail2010005231OK.pdf393.53 kB
Adobe PDF
Full metadata record
Semi-supervised learning, which uses a small amount of labeled data in conjunction with a large amount of unlabeled data for training, has recently attracted huge research attention due to the considerable improvement in learning accuracy. In this work, we focus on semi-supervised variable weighting for clustering, which is a critical step in clustering as it is known that interesting clustering structure usually occurs in a subspace defined by a subset of variables. Besides exploiting both labeled and unlabeled data to effectively identify the real importance of variables, our method embeds variable weighting in the process of semi-supervised clustering, rather than calculating variable weights separately, to ensure the computation efficiency. Our experiments carried out on both synthetic and real data demonstrate that semi-supervised variable weighting significantly improves the clustering accuracy of existing semi-supervised k-means without variable weighting, or with unsupervised variable weighting. Copyright © SIAM.
Please use this identifier to cite or link to this item: