A unified analysis of stochastic momentum methods for deep learning

Yan, Y; Yang, T; Li, Z; Lin, Q; Yang, Y

A unified analysis of stochastic momentum methods for deep learning

Yan, Y Yang, T Li, Z Lin, Q Yang, Y

Permalink

Publication Type:: Conference Proceeding
Citation:: IJCAI International Joint Conference on Artificial Intelligence, 2018, 2018-July pp. 2955 - 2961
Issue Date:: 2018-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (360.07 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yan, Y	en_US
dc.contributor.author	Yang, T	en_US
dc.contributor.author	Li, Z	en_US
dc.contributor.author	Lin, Q	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	IJCAI International Joint Conference on Artificial Intelligence, 2018, 2018-July pp. 2955 - 2961	en_US
dc.identifier.isbn	9780999241127	en_US
dc.identifier.issn	1045-0823	en_US
dc.identifier.uri	http://hdl.handle.net/10453/131496
dc.description.abstract	© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.	en_US
dc.relation.ispartof	IJCAI International Joint Conference on Artificial Intelligence	en_US
dc.title	A unified analysis of stochastic momentum methods for deep learning	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2018-July	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2018-July	en_US

Abstract:

© 2018 International Joint Conferences on Artificial Intelligence. All right reserved. Stochastic momentum methods have been widely adopted in training deep neural networks. However, their theoretical analysis of convergence of the training objective and the generalization error for prediction is still under-explored. This paper aims to bridge the gap between practice and theory by analyzing the stochastic gradient (SG) method, and the stochastic momentum methods including two famous variants, i.e., the stochastic heavy-ball (SHB) method and the stochastic variant of Nesterov's accelerated gradient (SNAG) method. We propose a framework that unifies the three variants. We then derive the convergence rates of the norm of gradient for the non-convex optimization problem, and analyze the generalization performance through the uniform stability approach. Particularly, the convergence analysis of the training objective exhibits that SHB and SNAG have no advantage over SG. However, the stability analysis shows that the momentum term can improve the stability of the learned model and hence improve the generalization performance. These theoretical insights verify the common wisdom and are also corroborated by our empirical analysis on deep learning.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/131496