Scale-aware crowd counting via depth-embedded convolutional neural networks

Zhao, M; Zhang, C; Zhang, J; Porikli, F; Ni, B; Zhang, W

Scale-aware crowd counting via depth-embedded convolutional neural networks

Zhao, M Zhang, C Zhang, J

Porikli, F Ni, B Zhang, W

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30, (10), pp. 3651-3662
Issue Date:: 2020-10-01

Closed Access

	Filename	Description	Size
	08846233.pdf	Published version	5.06 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, M
dc.contributor.author	Zhang, C
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541
dc.contributor.author	Porikli, F
dc.contributor.author	Ni, B
dc.contributor.author	Zhang, W
dc.date.accessioned	2020-11-26T00:12:37Z
dc.date.available	2020-11-26T00:12:37Z
dc.date.issued	2020-10-01
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30, (10), pp. 3651-3662
dc.identifier.issn	1051-8215
dc.identifier.issn	1558-2205
dc.identifier.uri	http://hdl.handle.net/10453/144355
dc.description.abstract	© 1991-2012 IEEE. Scale variation of pedestrians in a crowd image presents a significant challenge for vision-based people counting systems. Such variations are mainly caused by perspective-related distortions due to the camera pose relative to the ground plane. Following the density-based counting paradigm, we postulate that generating density values adaptive to object scales plays a critical role in the accuracy of the final counting results. Motivated by this, we distill the underlying information from depth cues to obtain scale-aware representations that can respond to object scales considering the fact that the scale is inversely proportional to the object depth. Specifically, we propose a depth embedding module as add-ons into existing networks. This module exploits essential depth cues to spatially re-calibrate the magnitude of the original features. In this way, the objects, although in the same class, will attain distinct representations according to their scales, which directly benefits the estimation of scale-aware density values. We conduct a comprehensive analysis of the effects of the depth embedding module and validate that exploiting depth cues to perceive object scale variations in convolutional neural networks improves crowd counting performances. Our experiments demonstrate the effectiveness of the proposed approach on four popular benchmark datasets.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	National ICT Australia
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology
dc.relation.isbasedon	10.1109/TCSVT.2019.2943010
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Scale-aware crowd counting via depth-embedded convolutional neural networks
dc.type	Journal Article
utslib.citation.volume	30
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2020-11-26T00:12:31Z
pubs.issue	10
pubs.publication-status	Published
pubs.volume	30
utslib.citation.issue	10

Abstract:

© 1991-2012 IEEE. Scale variation of pedestrians in a crowd image presents a significant challenge for vision-based people counting systems. Such variations are mainly caused by perspective-related distortions due to the camera pose relative to the ground plane. Following the density-based counting paradigm, we postulate that generating density values adaptive to object scales plays a critical role in the accuracy of the final counting results. Motivated by this, we distill the underlying information from depth cues to obtain scale-aware representations that can respond to object scales considering the fact that the scale is inversely proportional to the object depth. Specifically, we propose a depth embedding module as add-ons into existing networks. This module exploits essential depth cues to spatially re-calibrate the magnitude of the original features. In this way, the objects, although in the same class, will attain distinct representations according to their scales, which directly benefits the estimation of scale-aware density values. We conduct a comprehensive analysis of the effects of the depth embedding module and validate that exploiting depth cues to perceive object scale variations in convolutional neural networks improves crowd counting performances. Our experiments demonstrate the effectiveness of the proposed approach on four popular benchmark datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/144355