Crowd Counting in Low-Resolution Crowded Scenes Using Region-Based Deep Convolutional Neural Networks

Saqib, M; Khan, SD; Sharma, N; Blumenstein, M

Crowd Counting in Low-Resolution Crowded Scenes Using Region-Based Deep Convolutional Neural Networks

Saqib, M

Khan, SD Sharma, N

Blumenstein, M

Permalink

Publication Type:: Journal Article
Citation:: IEEE Access, 2019, 7 pp. 35317 - 35329
Issue Date:: 2019-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Submitted VersionAdobe PDF (2.39 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Saqib, M https://orcid.org/0000-0003-4374-0888	en_US
dc.contributor.author	Khan, SD	en_US
dc.contributor.author	Sharma, N https://orcid.org/0000-0003-0841-1245	en_US
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744	en_US
dc.date.issued	2019-01-01	en_US
dc.identifier.citation	IEEE Access, 2019, 7 pp. 35317 - 35329	en_US
dc.identifier.uri	http://hdl.handle.net/10453/134578
dc.description.abstract	© 2013 IEEE. Crowd counting and density estimation is an important and challenging problem in the visual analysis of the crowd. Most of the existing approaches use regression on density maps for the crowd count from a single image. However, these methods cannot localize individual pedestrian and therefore cannot estimate the actual distribution of pedestrians in the environment. On the other hand, detection-based methods detect and localize pedestrians in the scene, but the performance of these methods degrades when applied in high-density situations. To overcome the limitations of pedestrian detectors, we proposed a motion-guided filter (MGF) that exploits spatial and temporal information between consecutive frames of the video to recover missed detections. Our framework is based on the deep convolution neural network (DCNN) for crowd counting in the low-to-medium density videos. We employ various state-of-the-art network architectures, namely, Visual Geometry Group (VGG16), Zeiler and Fergus (ZF), and VGGM in the framework of a region-based DCNN for detecting pedestrians. After pedestrian detection, the proposed motion guided filter is employed. We evaluate the performance of our approach on three publicly available datasets. The experimental results demonstrate the effectiveness of our approach, which significantly improves the performance of the state-of-the-art detectors.	en_US
dc.relation.ispartof	IEEE Access	en_US
dc.relation.isbasedon	10.1109/ACCESS.2019.2904712	en_US
dc.title	Crowd Counting in Low-Resolution Crowded Scenes Using Region-Based Deep Convolutional Neural Networks	en_US
dc.type	Journal Article
utslib.citation.volume	7	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
utslib.for	10 Technology	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	7	en_US

Abstract:

© 2013 IEEE. Crowd counting and density estimation is an important and challenging problem in the visual analysis of the crowd. Most of the existing approaches use regression on density maps for the crowd count from a single image. However, these methods cannot localize individual pedestrian and therefore cannot estimate the actual distribution of pedestrians in the environment. On the other hand, detection-based methods detect and localize pedestrians in the scene, but the performance of these methods degrades when applied in high-density situations. To overcome the limitations of pedestrian detectors, we proposed a motion-guided filter (MGF) that exploits spatial and temporal information between consecutive frames of the video to recover missed detections. Our framework is based on the deep convolution neural network (DCNN) for crowd counting in the low-to-medium density videos. We employ various state-of-the-art network architectures, namely, Visual Geometry Group (VGG16), Zeiler and Fergus (ZF), and VGGM in the framework of a region-based DCNN for detecting pedestrians. After pedestrian detection, the proposed motion guided filter is employed. We evaluate the performance of our approach on three publicly available datasets. The experimental results demonstrate the effectiveness of our approach, which significantly improves the performance of the state-of-the-art detectors.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/134578