Co-teaching: Robust training of deep neural networks with extremely noisy labels

Han, B; Yao, Q; Yu, X; Niu, G; Xu, M; Hu, W; Tsang, IW; Sugiyama, M

Co-teaching: Robust training of deep neural networks with extremely noisy labels

Han, B Yao, Q Yu, X

Niu, G Xu, M Hu, W Tsang, IW

Sugiyama, M

Permalink

Publication Type:: Conference Proceeding
Citation:: Advances in Neural Information Processing Systems, 2018, 2018-December pp. 8527 - 8537
Issue Date:: 2018-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Published versionAdobe PDF (1.12 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Han, B	en_US
dc.contributor.author	Yao, Q	en_US
dc.contributor.author	Yu, X https://orcid.org/0000-0002-8941-2698	en_US
dc.contributor.author	Niu, G	en_US
dc.contributor.author	Xu, M	en_US
dc.contributor.author	Hu, W	en_US
dc.contributor.author	Tsang, IW https://orcid.org/0000-0001-8095-4637	en_US
dc.contributor.author	Sugiyama, M	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	Advances in Neural Information Processing Systems, 2018, 2018-December pp. 8527 - 8537	en_US
dc.identifier.issn	1049-5258	en_US
dc.identifier.uri	http://hdl.handle.net/10453/133283
dc.description.abstract	© 2018 Curran Associates Inc.All rights reserved. Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called “Co-teaching” for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT130100746
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation	http://purl.org/au-research/grants/arc/DP180100106
dc.relation.ispartof	Advances in Neural Information Processing Systems	en_US
dc.title	Co-teaching: Robust training of deep neural networks with extremely noisy labels	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2018-December	en_US
utslib.for	1701 Psychology	en_US
utslib.for	1702 Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2018-December	en_US

Abstract:

© 2018 Curran Associates Inc.All rights reserved. Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called “Co-teaching” for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/133283