Transform-invariant convolutional neural networks for image classification and search

Shen, X; Tian, X; He, A; Sun, S; Tao, D

Transform-invariant convolutional neural networks for image classification and search

Shen, X Tian, X He, A Sun, S Tao, D

Permalink

Publication Type:: Conference Proceeding
Citation:: MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, 2016, pp. 1345 - 1354
Issue Date:: 2016-10-01

Closed Access

	Filename	Description	Size
	p1345-shen.pdf	Published version	6.77 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Shen, X	en_US
dc.contributor.author	Tian, X	en_US
dc.contributor.author	He, A	en_US
dc.contributor.author	Sun, S	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.date.issued	2016-10-01	en_US
dc.identifier.citation	MM 2016 - Proceedings of the 2016 ACM Multimedia Conference, 2016, pp. 1345 - 1354	en_US
dc.identifier.isbn	9781450336031	en_US
dc.identifier.uri	http://hdl.handle.net/10453/122933
dc.description.abstract	© 2016 ACM. Convolutional neural networks (CNNs) have achieved stateof-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and nonlinear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we propose randomly transforming (rotation, scale, and translation) feature maps of CNNs during the training stage. This prevents complex dependencies of specific rotation, scale, and translation levels of training images in CNN models. Rather, each convolutional kernel learns to detect a feature that is generally helpful for producing the transforminvariant answer given the combinatorially large variety of transform levels of its input feature maps. In this way, we do not require any extra training supervision or modification to the optimization process and training images. We show that random transformation provides significant improvements of CNNs on many benchmark tasks, including small-scale image recognition, large-scale image recognition, and image retrieval.	en_US
dc.relation.ispartof	MM 2016 - Proceedings of the 2016 ACM Multimedia Conference	en_US
dc.relation.isbasedon	10.1145/264284.2964316	en_US
dc.title	Transform-invariant convolutional neural networks for image classification and search	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2016 ACM. Convolutional neural networks (CNNs) have achieved stateof-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and nonlinear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we propose randomly transforming (rotation, scale, and translation) feature maps of CNNs during the training stage. This prevents complex dependencies of specific rotation, scale, and translation levels of training images in CNN models. Rather, each convolutional kernel learns to detect a feature that is generally helpful for producing the transforminvariant answer given the combinatorially large variety of transform levels of its input feature maps. In this way, we do not require any extra training supervision or modification to the optimization process and training images. We show that random transformation provides significant improvements of CNNs on many benchmark tasks, including small-scale image recognition, large-scale image recognition, and image retrieval.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/122933