Shakeout: A New Approach to Regularized Deep Neural Network Training

Kang, G; Li, J; Tao, D

Shakeout: A New Approach to Regularized Deep Neural Network Training

Kang, G

Li, J

Tao, D

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (5), pp. 1245 - 1258
Issue Date:: 2018-05-01

Closed Access

	Filename	Description	Size
	07920425.pdf	Published Version	1.8 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Kang, G https://orcid.org/0000-0003-1978-2025	en_US
dc.contributor.author	Li, J https://orcid.org/0000-0002-1336-2241	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.date.issued	2018-05-01	en_US
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40 (5), pp. 1245 - 1258	en_US
dc.identifier.issn	0162-8828	en_US
dc.identifier.uri	http://hdl.handle.net/10453/132659
dc.description.abstract	© 1979-2012 IEEE. Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines L-{0} , L-{1} and L-{2} regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP140102164
dc.relation.ispartof	IEEE Transactions on Pattern Analysis and Machine Intelligence	en_US
dc.relation.isbasedon	10.1109/TPAMI.2017.2701831	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Shakeout: A New Approach to Regularized Deep Neural Network Training	en_US
dc.type	Journal Article
utslib.citation.volume	5	en_US
utslib.citation.volume	40	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0806 Information Systems	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.issue	5	en_US
pubs.publication-status	Published	en_US
pubs.volume	40	en_US

Abstract:

© 1979-2012 IEEE. Recent years have witnessed the success of deep neural networks in dealing with a plenty of practical problems. Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training. In this paper, we present a new regularized training approach: Shakeout. Instead of randomly discarding units as Dropout does at the training stage, Shakeout randomly chooses to enhance or reverse each unit's contribution to the next layer. This minor modification of Dropout has the statistical trait: the regularizer induced by Shakeout adaptively combines L-{0} , L-{1} and L-{2} regularization terms. Our classification experiments with representative deep architectures on image datasets MNIST, CIFAR-10 and ImageNet show that Shakeout deals with over-fitting effectively and outperforms Dropout. We empirically demonstrate that Shakeout leads to sparser weights under both unsupervised and supervised settings. Shakeout also leads to the grouping effect of the input units in a layer. Considering the weights in reflecting the importance of connections, Shakeout is superior to Dropout, which is valuable for the deep model compression. Moreover, we demonstrate that Shakeout can effectively reduce the instability of the training process of the deep architecture.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/132659