Soft Dropout and Its Variational Bayes Approximation

Xie, J; Ma, Z; Zhang, G; Xue, J; Tan, Z; Guo, J

Soft Dropout and Its Variational Bayes Approximation

Xie, J Ma, Z Zhang, G

Xue, J Tan, Z Guo, J

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), 2019, 2019-October
Issue Date:: 2019-10-13

Closed Access

	Filename	Description	Size
	08918818.pdf	Published version	331.32 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Xie, J
dc.contributor.author	Ma, Z
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-4521-542X
dc.contributor.author	Xue, J
dc.contributor.author	Tan, Z
dc.contributor.author	Guo, J
dc.date	2019-10-13
dc.date.accessioned	2020-06-14T20:15:58Z
dc.date.available	2020-06-14T20:15:58Z
dc.date.issued	2019-10-13
dc.identifier.citation	2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP), 2019, 2019-October
dc.identifier.isbn	978-1-7281-0824-7
dc.identifier.issn	1551-2541
dc.identifier.issn	2161-0371
dc.identifier.uri	http://hdl.handle.net/10453/141419
dc.description.abstract	Soft dropout, a generalization of standard “hard” dropout, is introduced to regularize the parameters in neural networks and prevent overfitting. We replace the “hard” dropout mask following a Bernoulli distribution with the “soft” mask following a beta distribution to drop the hidden nodes in different levels. The soft dropout method can introduce continuous mask coefficients in the interval of [0, 1], rather than only zero and one. Meanwhile, in order to implement the adaptive dropout rate via adaptive distribution parameters, we respectively utilize the half-Gaussian distributed and the half-Laplace distributed variables to approximate the beta distributed masks and apply a variation of variational Bayes optimization called stochastic gradient variational Bayes (SGVB) algorithm to optimize the distribution parameters. In the experiments, compared with the standard soft dropout with fixed dropout rate, the adaptive soft dropout method generally improves the performance. In addition, the proposed soft dropout and its adaptive versions achieve performance improvement compared with the referred methods on both image classification and regression tasks.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2019 IEEE 29th International Workshop on Machine Learning for Signal Processing (MLSP)
dc.relation.ispartof	International Workshop on Machine Learning for Signal Processing
dc.relation.isbasedon	10.1109/MLSP.2019.8918818
dc.rights	© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.title	Soft Dropout and Its Variational Bayes Approximation
dc.type	Conference Proceeding
utslib.citation.volume	2019-October
utslib.location.activity	Pittsburgh, PA, USA
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2020-06-14T20:15:55Z
pubs.finish-date	2019-10-16
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2019-10-13
pubs.volume	2019-October
dc.location	Piscataway, USA

Abstract:

Soft dropout, a generalization of standard “hard” dropout, is introduced to regularize the parameters in neural networks and prevent overfitting. We replace the “hard” dropout mask following a Bernoulli distribution with the “soft” mask following a beta distribution to drop the hidden nodes in different levels. The soft dropout method can introduce continuous mask coefficients in the interval of [0, 1], rather than only zero and one. Meanwhile, in order to implement the adaptive dropout rate via adaptive distribution parameters, we respectively utilize the half-Gaussian distributed and the half-Laplace distributed variables to approximate the beta distributed masks and apply a variation of variational Bayes optimization called stochastic gradient variational Bayes (SGVB) algorithm to optimize the distribution parameters. In the experiments, compared with the standard soft dropout with fixed dropout rate, the adaptive soft dropout method generally improves the performance. In addition, the proposed soft dropout and its adaptive versions achieve performance improvement compared with the referred methods on both image classification and regression tasks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/141419