A study of data pre-processing techniques for imbalanced biomedical data classification

Publisher:
Inderscience Publishers
Publication Type:
Journal Article
Citation:
International Journal of Bioinformatics Research and Applications, 2020, 16, (3), pp. 290-318
Issue Date:
2020-01-01
Filename Description Size
ijbra.2020.109103.pdfPublished version3.17 MB
Adobe PDF
Full metadata record
© 2020 Inderscience Enterprises Ltd. Biomedical data are widely accepted in developing prediction models for identifying a specific tumour, drug discovery and human cancers detection. However, previous studies usually focused on different classifiers, and overlook the class imbalance problem in real-world biomedical datasets. This paper mainly focuses on reviewing and evaluating some popular and recently developed resampling and feature selection (FS) methods for class imbalance learning with data distribution being considered. Experimental results show that: 1) resampling and FS techniques exhibit better performance using support vector machine (SVM) classifier; 2) techniques such as random undersampling and FS perform better than other data pre-processing methods with T location-scale distribution when using SVM and K-nearest neighbours (KNN) classifiers. Random oversampling outperforms other methods on negative binomial distribution using Random Forest with lower level of imbalance ratio; 3) FS outperforms other data pre-processing methods in most cases, thus, FS with SVM classifier is the best choice for imbalanced biomedical data learning.
Please use this identifier to cite or link to this item: