Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling

Zhang, Y; Zhu, X; Wu, X; Bond, JP

Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling

Zhang, Y Zhu, X Wu, X Bond, JP

Permalink

Publication Type:: Journal Article
Citation:: Information Systems, 2011, 36 (8), pp. 1135 - 1157
Issue Date:: 2011-12-01

Closed Access

	Filename	Description	Size
	2011000601OK.pdf		834.79 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, Y	en_US
dc.contributor.author	Zhu, X	en_US
dc.contributor.author	Wu, X	en_US
dc.contributor.author	Bond, JP	en_US
dc.date.issued	2011-12-01	en_US
dc.identifier.citation	Information Systems, 2011, 36 (8), pp. 1135 - 1157	en_US
dc.identifier.issn	0306-4379	en_US
dc.identifier.uri	http://hdl.handle.net/10453/18235
dc.description.abstract	Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging [4] or aggressive classifier ensemble (ACE) [56], which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer. © 2011 Elsevier B.V. All rights reserved.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT100100971
dc.relation.ispartof	Information Systems	en_US
dc.relation.isbasedon	10.1016/j.is.2011.05.002	en_US
dc.subject.classification	Information Systems	en_US
dc.title	Corrective classification: Learning from data imperfections with aggressive and diverse classifier ensembling	en_US
dc.type	Journal Article
utslib.citation.volume	8	en_US
utslib.citation.volume	36	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access
pubs.issue	8	en_US
pubs.publication-status	Published	en_US
pubs.volume	36	en_US

Abstract:

Learning from imperfect (noisy) information sources is a challenging and reality issue for many data mining applications. Common practices include data quality enhancement by applying data preprocessing techniques or employing robust learning algorithms to avoid developing overly complicated structures that overfit the noise. The essential goal is to reduce noise impact and eventually enhance the learners built from noise-corrupted data. In this paper, we propose a novel corrective classification (C2) design, which incorporates data cleansing, error correction, Bootstrap sampling and classifier ensembling for effective learning from noisy data sources. C2 differs from existing classifier ensembling or robust learning algorithms in two aspects. On one hand, a set of diverse base learners of C2 constituting the ensemble are constructed via a Bootstrap sampling process; on the other hand, C2 further improves each base learner by unifying error detection, correction and data cleansing to reduce noise impact. Being corrective, the classifier ensemble is built from data preprocessed/corrected by the data cleansing and correcting modules. Experimental comparisons demonstrate that C2 is not only more accurate than the learner built from original noisy sources, but also more reliable than Bagging [4] or aggressive classifier ensemble (ACE) [56], which are two degenerated components/variants of C2. The comparisons also indicate that C2 is more stable than Boosting and DECORATE, which are two state-of-the-art ensembling methods. For real-world imperfect information sources (i.e. noisy training and/or test data), C2 is able to deliver more accurate and reliable prediction models than its other peers can offer. © 2011 Elsevier B.V. All rights reserved.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/18235