Classification with label noise: a Markov chain sampling framework

Zhao, Z; Chu, L; Tao, D; Pei, J

Classification with label noise: a Markov chain sampling framework

Zhao, Z Chu, L Tao, D

Pei, J

Permalink

Publication Type:: Journal Article
Citation:: Data Mining and Knowledge Discovery, 2019, 33 (5), pp. 1468 - 1504
Issue Date:: 2019-09-01

Closed Access

	Filename	Description	Size
	Zhao2019_Article_ClassificationWithLabelNoiseAM.pdf	Published Version	1.33 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, Z	en_US
dc.contributor.author	Chu, L	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.contributor.author	Pei, J	en_US
dc.date.issued	2019-09-01	en_US
dc.identifier.citation	Data Mining and Knowledge Discovery, 2019, 33 (5), pp. 1468 - 1504	en_US
dc.identifier.issn	1384-5810	en_US
dc.identifier.uri	http://hdl.handle.net/10453/138350
dc.description.abstract	© 2018, The Author(s). The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.	en_US
dc.relation.ispartof	Data Mining and Knowledge Discovery	en_US
dc.relation.isbasedon	10.1007/s10618-018-0592-8	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Classification with label noise: a Markov chain sampling framework	en_US
dc.type	Journal Article
utslib.citation.volume	5	en_US
utslib.citation.volume	33	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0804 Data Format	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.issue	5	en_US
pubs.publication-status	Published	en_US
pubs.volume	33	en_US

Abstract:

© 2018, The Author(s). The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/138350