A refinement approach to handling model misfit in semi-supervised learning

Publication Type:
Conference Proceeding
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010, 6441 LNAI (PART 2), pp. 75 - 86
Issue Date:
Filename Description Size
Thumbnail2010003109OK.pdf441.25 kB
Adobe PDF
Full metadata record
Semi-supervised learning has been the focus of machine learning and data mining research in the past few years. Various algorithms and techniques have been proposed, from generative models to graph-based algorithms. In this work, we focus on the Cluster-and-Label approaches for semi-supervised classification. Existing cluster-and-label algorithms are based on some underlying models and/or assumptions. When the data fits the model well, the classification accuracy will be high. Otherwise, the accuracy will be low. In this paper, we propose a refinement approach to address the model misfit problem in semi-supervised classification. We show that we do not need to change the cluster-and-label technique itself to make it more flexible. Instead, we propose to use successive refinement clustering of the dataset to correct the model misfit. A series of experiments on UCI benchmarking data sets have shown that the proposed approach outperforms existing cluster-and-label algorithms, as well as traditional semi-supervised classification techniques including Selftraining and Tri-training. © 2010 Springer-Verlag.
Please use this identifier to cite or link to this item: