Coupled fuzzy k-nearest neighbors classification of imbalanced non-IID categorical data
- Publication Type:
- Conference Proceeding
- Proceedings of the International Joint Conference on Neural Networks, 2014, pp. 1122 - 1129
- Issue Date:
© 2014 IEEE. Mining imbalanced data has recently received increasing attention due to its challenge and wide applications in the real world. Most of the existing work focuses on numerical data by manipulating the data structure which essentially changes the data characteristics or developing new distance or similarity measures which are designed for data with the so-called IID assumption, namely data is independent and identically distributed. This is not consistent with the real-life data and business needs, which request to fully respect the data structure and coupling relationships embedded in data objects, features and feature values. In this paper, we propose a novel coupled fuzzy similarity-based classification approach to cater for the difference between classes by a fuzzy membership and the couplings by coupled object similarity, and incorporate them into the most popular classifier: kNN to form a coupled fuzzy kNN (ie. CF-kNN). We test the approach on 14 categorical data sets compared to several kNN variants and classic classifiers including C4.5 and NaiveBayes. The experimental results show that CF-kNN outperforms the baselines, and those classifiers incorporated with the proposed coupled fuzzy similarity perform better than their original editions.
Please use this identifier to cite or link to this item: