Cost-sensitive Semi-supervised Classification using CS-EM

IEEE Computer Society
Publication Type:
Conference Proceeding
Proceedings of 2008 IEEE 8th International Conference on Computer and Information Technology, 2008, pp. 131 - 136
Issue Date:
Full metadata record
Files in This Item:
Filename Description Size
Thumbnail2008001373OK.pdf629.15 kB
Adobe PDF
In many real world data mining and classification tasks, we face with the problem of high cost in making training data sets. In addition, in many domains, different misclassification errors involve different costs. These two issues are often addressed by semi-supervised learning and costsensitive learning separately. Sometimes the two issues can happen at the same time in real world applications. However, existing semi-supervised learning algorithms never consider the misclassification costs. In this paper, we propose a simple and novel method, CS-EM for learning cost-sensitive classifier using both labeled and unlabeled training data. CS-EM modifies EM, a popular semi-supervised learning algorithm by incorporating misclassification costs into the probability estimation process. Our experiments show that CS-EM outperforms other two competing methods on three bench mark text data sets across different cost ratios.
Please use this identifier to cite or link to this item: