Binary teaching–learning-based optimization algorithm with a new update mechanism for sample subset optimization in software defect prediction

Publisher:
Springer (part of Springer Nature)
Publication Type:
Journal Article
Citation:
Soft Computing, 2019, 23, (20), pp. 9919-9935
Issue Date:
2019-10-01
Filename Description Size
s00500-018-3546-6.pdfPublished version571.11 kB
Adobe PDF
Full metadata record
© 2018, Springer-Verlag GmbH Germany, part of Springer Nature. Software defect prediction has gained considerable attention in recent years. A broad range of computational methods has been developed for accurate prediction of faulty modules based on code and design metrics. One of the challenges in training classifiers is the highly imbalanced class distribution in available datasets, leading to an undesirable bias in the prediction performance for the minority class. Data sampling is a widespread technique to tackle this problem. However, traditional sampling methods, which depend mainly on random resampling from a given dataset, do not take advantage of useful information available in training sets, such as sample quality and representative instances. To cope with this limitation, evolutionary undersampling methods are usually used for identifying an optimal sample subset for the training dataset. This paper proposes a binary teaching–learning- based optimization algorithm employing a distribution-based solution update rule, namely BTLBOd, to generate a balanced subset of highly valuable examples. This subset is then applied to train a classifier for reliable prediction of potentially defective modules in a software system. Each individual in BTLBOd includes two vectors: a real-valued vector generated by the distribution-based update mechanism, and a binary vector produced from the corresponding real vector by a proposed mapping function. Empirical results showed that the optimal sample subset produced by BTLBOd might ameliorate the classification accuracy of the predictor on highly imbalanced software defect data. Obtained results also demonstrated the superior performance of the proposed sampling method compared to other popular sampling techniques.
Please use this identifier to cite or link to this item: