Learning a perspective-embedded deconvolution network for crowd counting
- Publication Type:
- Conference Proceeding
- Proceedings - IEEE International Conference on Multimedia and Expo, 2017, pp. 403 - 408
- Issue Date:
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is currently unavailable due to the publisher's embargo.
The embargo period expires on 28 Aug 2019
© 2017 IEEE. We present a novel deep learning framework for crowd counting by learning a perspective-embedded deconvolution network. Perspective is an inherent property of most surveillance scenes. Unlike the traditional approaches that exploit the perspective as a separate normalization, we propose to fuse the perspective into a deconvolution network, aiming to obtain a robust, accurate and consistent crowd density map. Through layer-wise fusion, we merge perspective maps at different resolutions into the deconvolution network. With the injection of perspective, our network is driven to learn to combine the underlying scene geometric constraints adaptively, thus enabling an accurate interpretation from high-level feature maps to the pixel-wise crowd density map. In addition, our network allows generating density map for arbitrary-sized input in an end-to-end fashion. The proposed method achieves competitive result on the WorldExpo2010 crowd dataset.
Please use this identifier to cite or link to this item: