Empirical study of bagging predictors on medical data

Liang, G; Zhang, C

Empirical study of bagging predictors on medical data

Liang, G

Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: Conferences in Research and Practice in Information Technology Series, 2010, 121 pp. 31 - 40
Issue Date:: 2010-12-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download full textAdobe PDF (897.49 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, G https://orcid.org/0000-0002-6843-7431	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2010-12-01	en_US
dc.identifier.citation	Conferences in Research and Practice in Information Technology Series, 2010, 121 pp. 31 - 40	en_US
dc.identifier.isbn	9781921770029	en_US
dc.identifier.issn	1445-1336	en_US
dc.identifier.uri	http://hdl.handle.net/10453/19124
dc.description.abstract	This study investigates the performance of bagging in terms of learning from imbalanced medical data. It is important for data miners to achieve highly accurate prediction models, and this is especially true for imbalanced medical applications. In these situations, practitioners are more interested in the minority class than the majority class; however, it is hard for a traditional supervised learning algorithm to achieve a highly accurate prediction on the minority class, even though it might achieve better results according to the most commonly used evaluation metric, Accuracy. Bagging is a simple yet effective ensemble method which has been applied to many real-world applications. However, some questions have not been well answered, e.g., whether bagging outperforms single learners on medical data-sets; which learners are the best predictors for each medical data-set; and what is the best predictive performance achievable for each medical data-set when we apply sampling techniques. We perform an extensive empirical study on the performance of 12 learning algorithms on 8 medical data-sets based on four performance measures: True Positive Rate (TPR), True Negative Rate (TNR), Geometric Mean (G-mean) of the accuracy rate of the majority class and the minority class, and Accuracy as evaluation metrics. In addition, the statistical analyses performed instil confidence in the validity of the conclusions of this research. © 2011, Australian Computer Society, Inc.	en_US
dc.relation.ispartof	Conferences in Research and Practice in Information Technology Series	en_US
dc.title	Empirical study of bagging predictors on medical data	en_US
dc.type	Conference Proceeding
utslib.citation.volume	121	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
dc.location.activity	Ballarat, Australia	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	121	en_US

Abstract:

This study investigates the performance of bagging in terms of learning from imbalanced medical data. It is important for data miners to achieve highly accurate prediction models, and this is especially true for imbalanced medical applications. In these situations, practitioners are more interested in the minority class than the majority class; however, it is hard for a traditional supervised learning algorithm to achieve a highly accurate prediction on the minority class, even though it might achieve better results according to the most commonly used evaluation metric, Accuracy. Bagging is a simple yet effective ensemble method which has been applied to many real-world applications. However, some questions have not been well answered, e.g., whether bagging outperforms single learners on medical data-sets; which learners are the best predictors for each medical data-set; and what is the best predictive performance achievable for each medical data-set when we apply sampling techniques. We perform an extensive empirical study on the performance of 12 learning algorithms on 8 medical data-sets based on four performance measures: True Positive Rate (TPR), True Negative Rate (TNR), Geometric Mean (G-mean) of the accuracy rate of the majority class and the minority class, and Accuracy as evaluation metrics. In addition, the statistical analyses performed instil confidence in the validity of the conclusions of this research. © 2011, Australian Computer Society, Inc.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/19124