Gated Channel Transformation for Visual Recognition

Yang, Z; Zhu, L; Wu, Y; Yang, Y

Gated Channel Transformation for Visual Recognition

Yang, Z

Zhu, L

Wu, Y

Yang, Y

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 00, pp. 11791-11800
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 1 Jan 2022

Adobe PDF

Download Accepted versionAdobe PDF (1.74 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, Z https://orcid.org/0000-0001-8783-8313
dc.contributor.author	Zhu, L https://orcid.org/0000-0002-4093-7557
dc.contributor.author	Wu, Y https://orcid.org/0000-0002-1680-8253
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date	2020-06-13
dc.date.accessioned	2021-04-21T09:47:39Z
dc.date.available	2021-04-21T09:47:39Z
dc.date.issued	2020-01-01
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 00, pp. 11791-11800
dc.identifier.isbn	978-1-7281-7168-5
dc.identifier.issn	1063-6919
dc.identifier.issn	2575-7075
dc.identifier.uri	http://hdl.handle.net/10453/148265
dc.description.abstract	In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.
dc.language	en
dc.publisher	IEEE
dc.relation	http://purl.org/au-research/grants/arc/DP200100938
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.ispartof	IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/CVPR42600.2020.01181
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Gated Channel Transformation for Visual Recognition
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Seattle, WA, USA
utslib.for	0801 Artificial Intelligence and Image Processing
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
utslib.copyright.embargo	2022-01-01T00:00:00+1000Z
dc.date.updated	2021-04-21T09:47:31Z
pubs.finish-date	2020-06-19
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2020-06-13
pubs.volume	00
dc.location	Piscataway, USA

Abstract:

In this work, we propose a generally applicable transformation unit for visual recognition with deep convolutional neural networks. This transformation explicitly models channel relationships with explainable control variables. These variables determine the neuron behaviors of competition or cooperation, and they are jointly optimized with the convolutional weight towards more accurate recognition. In Squeeze-and-Excitation (SE) Networks, the channel relationships are implicitly learned by fully connected layers, and the SE block is integrated at the block-level. We instead introduce a channel normalization layer to reduce the number of parameters and computational complexity. This lightweight layer incorporates a simple l2 normalization, enabling our transformation unit applicable to operator-level without much increase of additional parameters. Extensive experiments demonstrate the effectiveness of our unit with clear margins on many vision tasks, i.e., image classification on ImageNet, object detection and instance segmentation on COCO, video classification on Kinetics.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148265