An empirical evaluation of bagging with different algorithms on imbalanced data

Liang, G; Zhang, C

An empirical evaluation of bagging with different algorithms on imbalanced data

Liang, G

Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 7120 LNAI (PART 1), pp. 339 - 352
Issue Date:: 2011-12-28

Closed Access

	Filename	Description	Size
	2010005259OK.pdf		4.8 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, G https://orcid.org/0000-0002-6843-7431	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2011-12-28	en_US
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2011, 7120 LNAI (PART 1), pp. 339 - 352	en_US
dc.identifier.isbn	9783642258527	en_US
dc.identifier.issn	0302-9743	en_US
dc.identifier.uri	http://hdl.handle.net/10453/19210
dc.description.abstract	This study investigates the effectiveness of bagging with respect to different learning algorithms on Imbalanced data-sets. The purpose of this research is to investigate the performance of bagging based on two unique approaches: (1) classify base learners with respect to 12 different learning algorithms in general terms, and (2) evaluate the performance of bagging predictors on data with imbalanced class distributions. The former approach develops a method to categorize base learners by using two-dimensional robustness and stability decomposition on 48 benchmark data-sets; while the latter approach investigates the performance of bagging predictors by using evaluation metrics, True Positive Rate (TPR), Geometric mean (G-mean) for the accuracy on the majority and minority classes, and the Receiver Operating Characteristic (ROC) curve on 12 imbalanced data-sets. Our studies assert that both stability and robustness are important factors for building high performance bagging predictors on data with imbalanced class distributions. The experimental results demonstrated that PART and Multi-layer Proceptron (MLP) are the learning algorithms with the best bagging performance on 12 imbalanced data-sets. Moreover, only four out of 12 bagging predictors are statistically superior to single learners based on both G-mean and TPR evaluation metrics over 12 imbalanced data-sets. © 2011 Springer-Verlag.	en_US
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	en_US
dc.relation.isbasedon	10.1007/978-3-642-25853-4_26	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	An empirical evaluation of bagging with different algorithms on imbalanced data	en_US
dc.type	Conference Proceeding
utslib.citation.volume	PART 1	en_US
utslib.citation.volume	7120 LNAI	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Beijing, China	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	PART 1	en_US
pubs.publication-status	Published	en_US
pubs.volume	7120 LNAI	en_US

Abstract:

This study investigates the effectiveness of bagging with respect to different learning algorithms on Imbalanced data-sets. The purpose of this research is to investigate the performance of bagging based on two unique approaches: (1) classify base learners with respect to 12 different learning algorithms in general terms, and (2) evaluate the performance of bagging predictors on data with imbalanced class distributions. The former approach develops a method to categorize base learners by using two-dimensional robustness and stability decomposition on 48 benchmark data-sets; while the latter approach investigates the performance of bagging predictors by using evaluation metrics, True Positive Rate (TPR), Geometric mean (G-mean) for the accuracy on the majority and minority classes, and the Receiver Operating Characteristic (ROC) curve on 12 imbalanced data-sets. Our studies assert that both stability and robustness are important factors for building high performance bagging predictors on data with imbalanced class distributions. The experimental results demonstrated that PART and Multi-layer Proceptron (MLP) are the learning algorithms with the best bagging performance on 12 imbalanced data-sets. Moreover, only four out of 12 bagging predictors are statistically superior to single learners based on both G-mean and TPR evaluation metrics over 12 imbalanced data-sets. © 2011 Springer-Verlag.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/19210