A hybrid coupled k-nearest neighbor algorithm on imbalance data
- Publication Type:
- Conference Proceeding
- Proceedings of the International Joint Conference on Neural Networks, 2014, pp. 2011 - 2018
- Issue Date:
© 2014 IEEE. The state-of-the-art classification algorithms rarely consider the relationship between the attributes in the data sets and assume the attributes are independently to each other (IID). However, in real-world data, these attributes are more or less interacted via explicit or implicit relationships. Although the classifiers for class-balanced data are relatively well developed, the classification of class-imbalanced data is not straightforward, especially for mixed type data which has both categorical and numerical features. Limited research has been conducted on the class-imbalanced data. Some algorithms mainly synthesize or remove instances to force the sizes of each class comparable, which may change the inherent data structure or introduces noise to the source data. While for the distance or similarity based algorithms, they ignored the relationship between features when computing the similarity. This paper proposes a hybrid coupled k-nearest neighbor classification algorithm (HC-kNN) for mixed type data, by doing discretization on numerical features to adapt the inter coupling similarity as we do on categorical features, then combing this coupled similarity to the original similarity or distance, to overcome the shortcoming of the previous algorithms. The experiment results demonstrate that our proposed algorithm can get a higher average performance than that of the relevant algorithms (e.g. the variants of kNN, Decision Tree, SMOTE and NaiveBayes).
Please use this identifier to cite or link to this item: