An Empirical Evaluation of Bagging with Different Algorithms on Imbalanced Data

Publisher:
Springer-Verlag Berlin / Heidelberg
Publication Type:
Conference Proceeding
Citation:
Advanced Data Mining and Applications. Lecture Notes in Artificial Intelligence 7120, 2011, pp. 339 - 352
Issue Date:
2011-01
Full metadata record
Files in This Item:
Filename Description Size
2010005259OK.pdf4.8 MB
Adobe PDF
This study investigates the effectiveness of bagging with respect to different learning algorithms on Imbalanced data-sets. The purpose of this research is to investigate the performance of bagging based on two unique approaches: (1) classify base learners with respect to 12 different learning algorithms in general terms, and (2) evaluate the performance of bagging predictors on data with imbalanced class distributions. The former approach develops a method to categorize base learners by using two-dimensional robustness and stability decomposition on 48 benchmark data-sets; while the latter approach investigates the performance of bagging predictors by using evaluation metrics, True Positive Rate (TPR), Geometric mean (G-mean) for the accuracy on the majority and minority classes, and the Receiver Operating Characteristic (ROC) curve on 12 imbalanced data-sets. Our studies assert that both stability and robustness are important factors for building high performance bagging predictors on data with imbalanced class distributions. The experimental results demonstrated that PART and Multi-layer Proceptron (MLP) are the learning algorithms with the best bagging performance on 12 imbalanced data-sets. Moreover, only four out of 12 bagging predictors are statistically superior to single learners based on both G-mean and TPR evaluation metrics over 12 imbalanced data-sets.
Please use this identifier to cite or link to this item: