Learning a perspective-embedded deconvolution network for crowd counting

Publication Type:
Conference Proceeding
Proceedings - IEEE International Conference on Multimedia and Expo, 2017, pp. 403 - 408
Issue Date:
Filename Description Size
741.pdfAccepted Manuscript914.05 kB
Adobe PDF
Full metadata record
© 2017 IEEE. We present a novel deep learning framework for crowd counting by learning a perspective-embedded deconvolution network. Perspective is an inherent property of most surveillance scenes. Unlike the traditional approaches that exploit the perspective as a separate normalization, we propose to fuse the perspective into a deconvolution network, aiming to obtain a robust, accurate and consistent crowd density map. Through layer-wise fusion, we merge perspective maps at different resolutions into the deconvolution network. With the injection of perspective, our network is driven to learn to combine the underlying scene geometric constraints adaptively, thus enabling an accurate interpretation from high-level feature maps to the pixel-wise crowd density map. In addition, our network allows generating density map for arbitrary-sized input in an end-to-end fashion. The proposed method achieves competitive result on the WorldExpo2010 crowd dataset.
Please use this identifier to cite or link to this item: