Dimensionality-dependent generalization bounds for k-dimensional coding schemes

Liu, T; Tao, D; Xu, D

Dimensionality-dependent generalization bounds for k-dimensional coding schemes

Liu, T Tao, D

Xu, D

Permalink

Publication Type:: Journal Article
Citation:: Neural Computation, 2016, 28 (10), pp. 2213 - 2249
Issue Date:: 2016-10-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (664.33 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, T	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.contributor.author	Xu, D https://orcid.org/0000-0003-2775-9730	en_US
dc.date.issued	2016-10-01	en_US
dc.identifier.citation	Neural Computation, 2016, 28 (10), pp. 2213 - 2249	en_US
dc.identifier.issn	0899-7667	en_US
dc.identifier.uri	http://hdl.handle.net/10453/92721
dc.description.abstract	© 2016 Massachusetts Institute of Technology. The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order O((mk ln(mkn)/n)λn, where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, λn > 0.5 when n is finite and λn = 0.5 when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP140102164
dc.relation.ispartof	Neural Computation	en_US
dc.relation.isbasedon	10.1162/NECO_a_00872	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Dimensionality-dependent generalization bounds for k-dimensional coding schemes	en_US
dc.type	Journal Article
utslib.citation.volume	10	en_US
utslib.citation.volume	28	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	open_access
pubs.issue	10	en_US
pubs.publication-status	Published	en_US
pubs.volume	28	en_US

Abstract:

© 2016 Massachusetts Institute of Technology. The k-dimensional coding schemes refer to a collection of methods that attempt to represent data using a set of representative k-dimensional vectors and include nonnegative matrix factorization, dictionary learning, sparse coding, k-means clustering, and vector quantization as special cases. Previous generalization bounds for the reconstruction error of the k-dimensional coding schemes are mainly dimensionality-independent. A major advantage of these bounds is that they can be used to analyze the generalization error when data are mapped into an infinite- or high-dimensional feature space. However, many applications use finite-dimensional data features. Can we obtain dimensionality-dependent generalization bounds for k-dimensional coding schemes that are tighter than dimensionality-independent bounds when data are in a finite-dimensional feature space? Yes. In this letter, we address this problem and derive a dimensionality-dependent generalization bound for k-dimensional coding schemes by bounding the covering number of the loss function class induced by the reconstruction error. The bound is of order O((mk ln(mkn)/n)λn, where m is the dimension of features, k is the number of the columns in the linear implementation of coding schemes, and n is the size of sample, λn > 0.5 when n is finite and λn = 0.5 when n is infinite. We show that our bound can be tighter than previous results because it avoids inducing the worst-case upper bound on k of the loss function. The proposed generalization bound is also applied to some specific coding schemes to demonstrate that the dimensionality-dependent bound is an indispensable complement to the dimensionality-independent generalization bounds.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/92721