Adaptive multi-objective swarm fusion for imbalanced data classification

Li, J; Fong, S; Wong, RK; Chu, VW

Adaptive multi-objective swarm fusion for imbalanced data classification

Li, J

Fong, S Wong, RK Chu, VW

Permalink

Publication Type:: Journal Article
Citation:: Information Fusion, 2018, 39 pp. 1 - 24
Issue Date:: 2018-01-01

Closed Access

	Filename	Description	Size
	1-s2.0-S1566253517302087-main.pdf	Published Version	5.47 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, J https://orcid.org/0000-0003-1833-7413	en_US
dc.contributor.author	Fong, S	en_US
dc.contributor.author	Wong, RK	en_US
dc.contributor.author	Chu, VW	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	Information Fusion, 2018, 39 pp. 1 - 24	en_US
dc.identifier.issn	1566-2535	en_US
dc.identifier.uri	http://hdl.handle.net/10453/130121
dc.description.abstract	© 2017 Elsevier B.V. Learning a classifier from an imbalanced dataset is an important problem in data mining and machine learning. Since there is more information from the majority classes than the minorities in an imbalanced dataset, the classifier would become over-fitted to the former and under-fitted to the latter classes. Previous attempts to address the problem have been focusing on increasing the learning sensitivity to the minorities and/or rebalancing sample sizes among classes before learning. However, how to efficiently identify their optimal mix in rebalancing is still an unresolved problem. Due to non-linear relationships between attributes and class labels, merely to rebalance sample sizes rarely comes up with optimal results. Moreover, brute-force search for the perfect combination is known to be NP-hard and hence a smarter heuristic is required. In this paper, we propose a notion of swarm fusion to address the problem – using stochastic swarm heuristics to cooperatively optimize the mixtures. Comparing with conventional rebalancing methods, e.g., linear search, our novel fusion approach is able to find a close to optimal mix with improved accuracy and reliability. Most importantly, it has found to be with higher computational speed than other coupled swarm optimization techniques and iteration methods. In our experiments, we first compared our proposed solution with traditional methods on thirty publicly available imbalanced datasets. Using neural network as base learner, our proposed method is found to outperform other traditional methods by up to 69% in terms of the credibility of the learned classifiers. Secondly, we wrapped our proposed swarm fusion method with decision tree. Notably, it defeated six state-of-the-art methods on ten imbalanced datasets in all evolution metrics that we considered.	en_US
dc.relation.ispartof	Information Fusion	en_US
dc.relation.isbasedon	10.1016/j.inffus.2017.03.007	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Adaptive multi-objective swarm fusion for imbalanced data classification	en_US
dc.type	Journal Article
utslib.citation.volume	39	en_US
utslib.for	0801 Artificial Intelligence And Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
pubs.organisational-group	/University of Technology Sydney/Strength - CHT - Health Technologies
utslib.copyright.status	closed_access
pubs.declined	2019-02-05T13:03:53.787+1100
pubs.publication-status	Published	en_US
pubs.volume	39	en_US

Abstract:

© 2017 Elsevier B.V. Learning a classifier from an imbalanced dataset is an important problem in data mining and machine learning. Since there is more information from the majority classes than the minorities in an imbalanced dataset, the classifier would become over-fitted to the former and under-fitted to the latter classes. Previous attempts to address the problem have been focusing on increasing the learning sensitivity to the minorities and/or rebalancing sample sizes among classes before learning. However, how to efficiently identify their optimal mix in rebalancing is still an unresolved problem. Due to non-linear relationships between attributes and class labels, merely to rebalance sample sizes rarely comes up with optimal results. Moreover, brute-force search for the perfect combination is known to be NP-hard and hence a smarter heuristic is required. In this paper, we propose a notion of swarm fusion to address the problem – using stochastic swarm heuristics to cooperatively optimize the mixtures. Comparing with conventional rebalancing methods, e.g., linear search, our novel fusion approach is able to find a close to optimal mix with improved accuracy and reliability. Most importantly, it has found to be with higher computational speed than other coupled swarm optimization techniques and iteration methods. In our experiments, we first compared our proposed solution with traditional methods on thirty publicly available imbalanced datasets. Using neural network as base learner, our proposed method is found to outperform other traditional methods by up to 69% in terms of the credibility of the learned classifiers. Secondly, we wrapped our proposed swarm fusion method with decision tree. Notably, it defeated six state-of-the-art methods on ten imbalanced datasets in all evolution metrics that we considered.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/130121