Scale-aware crowd counting via depth-embedded convolutional neural networks

Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Journal Article
IEEE Transactions on Circuits and Systems for Video Technology, 2020, 30, (10), pp. 3651-3662
Issue Date:
Filename Description Size
08846233.pdfPublished version5.06 MB
Adobe PDF
Full metadata record
© 1991-2012 IEEE. Scale variation of pedestrians in a crowd image presents a significant challenge for vision-based people counting systems. Such variations are mainly caused by perspective-related distortions due to the camera pose relative to the ground plane. Following the density-based counting paradigm, we postulate that generating density values adaptive to object scales plays a critical role in the accuracy of the final counting results. Motivated by this, we distill the underlying information from depth cues to obtain scale-aware representations that can respond to object scales considering the fact that the scale is inversely proportional to the object depth. Specifically, we propose a depth embedding module as add-ons into existing networks. This module exploits essential depth cues to spatially re-calibrate the magnitude of the original features. In this way, the objects, although in the same class, will attain distinct representations according to their scales, which directly benefits the estimation of scale-aware density values. We conduct a comprehensive analysis of the effects of the depth embedding module and validate that exploiting depth cues to perceive object scale variations in convolutional neural networks improves crowd counting performances. Our experiments demonstrate the effectiveness of the proposed approach on four popular benchmark datasets.
Please use this identifier to cite or link to this item: