A suite of swarm dynamic multi-objective algorithms for rebalancing extremely imbalanced datasets

Li, J; Fong, S; Wong, RK; Mohammed, S; Fiaidhi, J; Sung, Y

A suite of swarm dynamic multi-objective algorithms for rebalancing extremely imbalanced datasets

Li, J

Fong, S Wong, RK Mohammed, S Fiaidhi, J Sung, Y

Permalink

Publication Type:: Journal Article
Citation:: Applied Soft Computing Journal, 2018, 69 pp. 784 - 805
Issue Date:: 2018-08-01

Closed Access

	Filename	Description	Size
	1-s2.0-S1568494617306919-main.pdf	Published Version	2.63 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.contributor.author	Fong, S	en_US
dc.contributor.author	Wong, RK	en_US
dc.contributor.author	Mohammed, S	en_US
dc.contributor.author	Fiaidhi, J	en_US
dc.contributor.author	Sung, Y	en_US
dc.date.issued	2018-08-01	en_US
dc.identifier.citation	Applied Soft Computing Journal, 2018, 69 pp. 784 - 805	en_US
dc.identifier.issn	1568-4946	en_US
dc.identifier.uri	http://hdl.handle.net/10453/130119
dc.description.abstract	© 2017 Imbalanced datasets can be found in a number of fields; they are commonly regarded as big data because of their sheer volume and high attribute dimensions. As the name suggests, imbalanced big datasets come with an extremely imbalanced ratio between the amount of major class and minority class samples. Traditional methods: have been attempted but still cannot fully, effectively, and reliably solve the imbalanced class classification problem, especially when the distribution of the classes is exceedingly imbalanced. In this paper, we propose a collection of algorithms to solve the problem of imbalanced datasets in binary data classification. Most traditional methods: rebalance the imbalanced dataset merely by matching the data quantities of the two classes. Our proposed algorithms, which take the form of a suite of variants, focus on guaranteeing the credibility of the classification model and reaching the greatest possible accuracy by dynamically rebalancing the training dataset with multi-objective swarm intelligence optimisation. The new algorithms are extended from those we proposed earlier, which had a single objective – first find a set of solutions that satisfy the Kappa criterion, then search for the solution in the set that offers the highest accuracy. Two main modifications are made in the new algorithms. Multi-objective optimisation is aimed at finding a solution that satisfies several criteria at the same time, such as accuracy and identifying a list of credibility indicators. The other enhancement is the incremental operation of the multi-objective optimisation. Incremental optimisation is imperative for processing data feeds that may arrive in a streaming manner. Instead of waiting for the full data archive to be available before optimisation, incremental optimisation rebalances the data feed segment by segment on the fly. The experimental results from the suite of proposed algorithms show that they can effectively attain better and more stable performances from the classification model and are accompanied by much greater credibility than the other five traditional methods when imbalanced datasets are used as training datasets for inducing a classifier.	en_US
dc.relation.ispartof	Applied Soft Computing Journal	en_US
dc.relation.isbasedon	10.1016/j.asoc.2017.11.028	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	A suite of swarm dynamic multi-objective algorithms for rebalancing extremely imbalanced datasets	en_US
dc.type	Journal Article
utslib.citation.volume	69	en_US
utslib.for	0102 Applied Mathematics	en_US
utslib.for	0801 Artificial Intelligence And Image Processing	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access
pubs.declined	2019-02-05T12:55:49.377+1100
pubs.publication-status	Published	en_US
pubs.volume	69	en_US

Abstract:

© 2017 Imbalanced datasets can be found in a number of fields; they are commonly regarded as big data because of their sheer volume and high attribute dimensions. As the name suggests, imbalanced big datasets come with an extremely imbalanced ratio between the amount of major class and minority class samples. Traditional methods: have been attempted but still cannot fully, effectively, and reliably solve the imbalanced class classification problem, especially when the distribution of the classes is exceedingly imbalanced. In this paper, we propose a collection of algorithms to solve the problem of imbalanced datasets in binary data classification. Most traditional methods: rebalance the imbalanced dataset merely by matching the data quantities of the two classes. Our proposed algorithms, which take the form of a suite of variants, focus on guaranteeing the credibility of the classification model and reaching the greatest possible accuracy by dynamically rebalancing the training dataset with multi-objective swarm intelligence optimisation. The new algorithms are extended from those we proposed earlier, which had a single objective – first find a set of solutions that satisfy the Kappa criterion, then search for the solution in the set that offers the highest accuracy. Two main modifications are made in the new algorithms. Multi-objective optimisation is aimed at finding a solution that satisfies several criteria at the same time, such as accuracy and identifying a list of credibility indicators. The other enhancement is the incremental operation of the multi-objective optimisation. Incremental optimisation is imperative for processing data feeds that may arrive in a streaming manner. Instead of waiting for the full data archive to be available before optimisation, incremental optimisation rebalances the data feed segment by segment on the fly. The experimental results from the suite of proposed algorithms show that they can effectively attain better and more stable performances from the classification model and are accompanied by much greater credibility than the other five traditional methods when imbalanced datasets are used as training datasets for inducing a classifier.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/130119