VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

Shang, F; Zhou, K; Liu, H; Cheng, J; Tsang, IW; Zhang, L; Tao, D; Jiao, L

VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning

Shang, F Zhou, K Liu, H Cheng, J Tsang, IW

Zhang, L Tao, D

Jiao, L

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Knowledge and Data Engineering, 2020, 32 (1), pp. 188 - 202
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (1.57 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Shang, F	en_US
dc.contributor.author	Zhou, K	en_US
dc.contributor.author	Liu, H	en_US
dc.contributor.author	Cheng, J	en_US
dc.contributor.author	Tsang, IW https://orcid.org/0000-0001-8095-4637	en_US
dc.contributor.author	Zhang, L	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.contributor.author	Jiao, L	en_US
dc.date.available	2020-05-25T19:02:21Z
dc.date.issued	2020-01-01	en_US
dc.identifier.citation	IEEE Transactions on Knowledge and Data Engineering, 2020, 32 (1), pp. 188 - 202	en_US
dc.identifier.issn	1041-4347	en_US
dc.identifier.uri	http://hdl.handle.net/10453/131414
dc.description.abstract	© 1989-2012 IEEE. In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from most algorithms that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate O(1/T2O(1/T2). Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT130100746
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation	http://purl.org/au-research/grants/arc/DP180100106
dc.relation	http://purl.org/au-research/grants/arc/DP180103424
dc.relation.ispartof	IEEE Transactions on Knowledge and Data Engineering	en_US
dc.relation.isbasedon	10.1109/TKDE.2018.2878765	en_US
dc.subject.classification	Information Systems	en_US
dc.title	VR-SGD: A Simple Stochastic Variance Reduction Method for Machine Learning	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	32	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	32	en_US

Abstract:

© 1989-2012 IEEE. In this paper, we propose a simple variant of the original SVRG, called variance reduced stochastic gradient descent (VR-SGD). Unlike the choices of snapshot and starting points in SVRG and its proximal variant, Prox-SVRG, the two vectors of VR-SGD are set to the average and last iterate of the previous epoch, respectively. The settings allow us to use much larger learning rates, and also make our convergence analysis more challenging. We also design two different update rules for smooth and non-smooth objective functions, respectively, which means that VR-SGD can tackle non-smooth and/or non-strongly convex problems directly without any reduction techniques. Moreover, we analyze the convergence properties of VR-SGD for strongly convex problems, which show that VR-SGD attains linear convergence. Different from most algorithms that have no convergence guarantees for non-strongly convex problems, we also provide the convergence guarantees of VR-SGD for this case, and empirically verify that VR-SGD with varying learning rates achieves similar performance to its momentum accelerated variant that has the optimal convergence rate O(1/T2O(1/T2). Finally, we apply VR-SGD to solve various machine learning problems, such as convex and non-convex empirical risk minimization, and leading eigenvalue computation. Experimental results show that VR-SGD converges significantly faster than SVRG and Prox-SVRG, and usually outperforms state-of-the-art accelerated methods, e.g., Katyusha.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/131414