Evaluation of Sampling-Based Ensembles of Classifiers on Imbalanced Data for Software Defect Prediction Problems

Khuat, TT; Le, MH

Evaluation of Sampling-Based Ensembles of Classifiers on Imbalanced Data for Software Defect Prediction Problems

Khuat, TT Le, MH

Permalink

Publisher:: Springer Science and Business Media LLC
Publication Type:: Journal Article
Citation:: SN Computer Science, 2020, 1, (2), pp. 108
Issue Date:: 2020-03-01

Closed Access

	Filename	Description	Size
	s42979-020-0119-4.pdf	Published version	993.62 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Khuat, TT
dc.contributor.author	Le, MH
dc.date.accessioned	2022-10-30T08:28:42Z
dc.date.available	2022-10-30T08:28:42Z
dc.date.issued	2020-03-01
dc.identifier.citation	SN Computer Science, 2020, 1, (2), pp. 108
dc.identifier.issn	2662-995X
dc.identifier.issn	2661-8907
dc.identifier.uri	http://hdl.handle.net/10453/162909
dc.description.abstract	Defect prediction in software projects plays a crucial role to reduce quality-based risk and increase the capability of detecting faulty program modules. Hence, classification approaches to anticipate software defect proneness based on static code characteristics have become a hot topic with a great deal of attention in recent years. While several novel studies show that the use of a single classifier causes the performance bottleneck, ensembles of classifiers might effectively enhance classification performance compared to a single classifier. However, the class imbalance property of software defect data severely hinders the classification efficiency of ensemble learning. To cope with this problem, resampling methods are usually combined into ensemble models.This paper empirically assesses the importance of sampling with regard to ensembles of various classifiers on imbalanced data in software defect prediction problems. Extensive experiments with the combination of seven different kinds of classification algorithms, three sampling methods, and two balanced data learning schemata were conducted over ten datasets. Empirical results indicated the positive effects of combining sampling techniques and the ensemble learning model on the performance of defect prediction regarding datasets with imbalanced class distributions.
dc.language	en
dc.publisher	Springer Science and Business Media LLC
dc.relation.ispartof	SN Computer Science
dc.relation.isbasedon	10.1007/s42979-020-0119-4
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Evaluation of Sampling-Based Ensembles of Classifiers on Imbalanced Data for Software Defect Prediction Problems
dc.type	Journal Article
utslib.citation.volume	1
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access	*
dc.date.updated	2022-10-30T08:28:41Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	1
utslib.citation.issue	2

Abstract:

Defect prediction in software projects plays a crucial role to reduce quality-based risk and increase the capability of detecting faulty program modules. Hence, classification approaches to anticipate software defect proneness based on static code characteristics have become a hot topic with a great deal of attention in recent years. While several novel studies show that the use of a single classifier causes the performance bottleneck, ensembles of classifiers might effectively enhance classification performance compared to a single classifier. However, the class imbalance property of software defect data severely hinders the classification efficiency of ensemble learning. To cope with this problem, resampling methods are usually combined into ensemble models.This paper empirically assesses the importance of sampling with regard to ensembles of various classifiers on imbalanced data in software defect prediction problems. Extensive experiments with the combination of seven different kinds of classification algorithms, three sampling methods, and two balanced data learning schemata were conducted over ten datasets. Empirical results indicated the positive effects of combining sampling techniques and the ensemble learning model on the performance of defect prediction regarding datasets with imbalanced class distributions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/162909