A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets

Wong, GY; Leung, FHF; Ling, SH

A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets

Wong, GY Leung, FHF Ling, SH

Permalink

Publication Type:: Conference Proceeding
Citation:: IECON Proceedings (Industrial Electronics Conference), 2013, pp. 2354 - 2359
Issue Date:: 2013-12-01

Closed Access

	Filename	Description	Size
	06699499.pdf	Published version	157.75 kB	Adobe PDF	View/Open
	79EE8C8A-8644-427E-B8C3-2CCDB1D9F15D am.pdf	accepted Manuscript Version	828.27 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wong, GY	en_US
dc.contributor.author	Leung, FHF	en_US
dc.contributor.author	Ling, SH https://orcid.org/0000-0003-0849-5098	en_US
dc.date.issued	2013-12-01	en_US
dc.identifier.citation	IECON Proceedings (Industrial Electronics Conference), 2013, pp. 2354 - 2359	en_US
dc.identifier.isbn	9781479902248	en_US
dc.identifier.uri	http://hdl.handle.net/10453/119873
dc.description.abstract	Imbalanced datasets are commonly encountered in real-world classification problems. However, many machine learning algorithms are originally designed for well-balanced datasets. Re-sampling has become an important step to preprocess imbalanced dataset. It aims at balancing the datasets by increasing the sample size of the smaller class or decreasing the sample size of the larger class, which are known as over-sampling and under-sampling respectively. In this paper, a novel sampling strategy based on both over-sampling and under-sampling is proposed, in which the new samples of the smaller class are created by the Synthetic Minority Over-sampling Technique (SMOTE). The improvement of the datasets is done by the evolutionary computational method of CHC that works on both the minority class and majority class samples. The result is a hybrid data preprocessing method that combines both over-sampling and under-sampling techniques to re-sample datasets. The evaluation is done by applying the learning algorithm C4.5 to obtain a classification model from the re-sampled datasets. Experimental results reported that the proposed approach can decrease the over-sampling rate about 50% with only around 3% discrepancy on the accuracy. © 2013 IEEE.	en_US
dc.relation.ispartof	IECON Proceedings (Industrial Electronics Conference)	en_US
dc.relation.isbasedon	10.1109/IECON.2013.6699499	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	A novel evolutionary preprocessing method based on over-sampling and under-sampling for imbalanced datasets	en_US
dc.type	Conference Proceeding
utslib.for	080108 Neural, Evolutionary and Fuzzy Computation	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US

Abstract:

Imbalanced datasets are commonly encountered in real-world classification problems. However, many machine learning algorithms are originally designed for well-balanced datasets. Re-sampling has become an important step to preprocess imbalanced dataset. It aims at balancing the datasets by increasing the sample size of the smaller class or decreasing the sample size of the larger class, which are known as over-sampling and under-sampling respectively. In this paper, a novel sampling strategy based on both over-sampling and under-sampling is proposed, in which the new samples of the smaller class are created by the Synthetic Minority Over-sampling Technique (SMOTE). The improvement of the datasets is done by the evolutionary computational method of CHC that works on both the minority class and majority class samples. The result is a hybrid data preprocessing method that combines both over-sampling and under-sampling techniques to re-sample datasets. The evaluation is done by applying the learning algorithm C4.5 to obtain a classification model from the re-sampled datasets. Experimental results reported that the proposed approach can decrease the over-sampling rate about 50% with only around 3% discrepancy on the accuracy. © 2013 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/119873