Similarity majority under-sampling technique for easing imbalanced classification problem

Publication Type:
Conference Proceeding
Citation:
Communications in Computer and Information Science, 2018, 845 pp. 3 - 23
Issue Date:
2018-01-01
Filename Description Size
Pages from 2018_Book_DataMining.pdfPublished version4.08 MB
Adobe PDF
Full metadata record
© Springer Nature Singapore Pte Ltd. 2018. Imbalanced classification problem is an enthusiastic topic in the fields of data mining, machine learning and pattern recognition. The imbalanced distributions of different class samples result in the classifier being over-fitted by learning too many majority class samples and under-fitted in recognizing minority class samples. Prior methods attempt to ease imbalanced problem through sampling techniques, in order to re-assign and rebalance the distributions of imbalanced dataset. In this paper, we proposed a novel notion to under-sample the majority class size for adjusting the original imbalanced class distributions. This method is called Similarity Majority Under-sampling Technique (SMUTE). By calculating the similarity of each majority class sample and observing its surrounding minority class samples, SMUTE effectively separates the majority and minority class samples to increase the recognition power for each class. The experimental results show that SMUTE could outperform the current under-sampling methods when the same under-sampling rate is used.
Please use this identifier to cite or link to this item: