On the Equivalence between Neural Network and Support Vector Machine

Chen, Y; Huang, W; Nguyen, LM; Weng, TW

On the Equivalence between Neural Network and Support Vector Machine

Chen, Y Huang, W Nguyen, LM Weng, TW

Permalink

Publication Type:: Conference Proceeding
Citation:: Advances in Neural Information Processing Systems, 2021, 28, pp. 23478-23490
Issue Date:: 2021-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (2.77 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Chen, Y
dc.contributor.author	Huang, W
dc.contributor.author	Nguyen, LM
dc.contributor.author	Weng, TW
dc.date.accessioned	2022-07-03T20:10:00Z
dc.date.available	2022-07-03T20:10:00Z
dc.date.issued	2021-01-01
dc.identifier.citation	Advances in Neural Information Processing Systems, 2021, 28, pp. 23478-23490
dc.identifier.isbn	9781713845393
dc.identifier.issn	1049-5258
dc.identifier.uri	http://hdl.handle.net/10453/158563
dc.description.abstract	Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) [27]. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK [4]. However, the equivalence is only known for ridge regression currently [6], while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalence between NN and a broad family of `2 regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) non-vacuous generalization bound of NN via the corresponding KM; (ii) nontrivial robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression.
dc.language	en
dc.relation.ispartof	Advances in Neural Information Processing Systems
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	1701 Psychology, 1702 Cognitive Sciences
dc.title	On the Equivalence between Neural Network and Support Vector Machine
dc.type	Conference Proceeding
utslib.citation.volume	28
utslib.for	1701 Psychology
utslib.for	1702 Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	open_access	*
dc.date.updated	2022-07-03T20:09:59Z
pubs.publication-status	Published
pubs.volume	28

Abstract:

Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) [27]. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK [4]. However, the equivalence is only known for ridge regression currently [6], while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalence between NN and a broad family of `2 regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) non-vacuous generalization bound of NN via the corresponding KM; (ii) nontrivial robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/158563