Mask assisted object coding with deep learning for object retrieval in surveillance videos

Publication Type:
Conference Proceeding
Citation:
MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia, 2014, pp. 1109 - 1112
Issue Date:
2014-01-01
Full metadata record
Files in This Item:
Filename Description Size
p1109-teng.pdfPublished version3.11 MB
Adobe PDF
Retrieving visual object from a large-scale video dataset is one of multimedia research focuses but a challenging task due to imprecise object extraction and partial occlusion. This paper presents a novel approach to efficiently encode and retrieve visual objects, which addresses some practical complications in surveillance videos. Specifically, we take advantage of the mask information to assist object representation, and develop an encoding method by utilizing highly nonlinear mapping with a deep neural network. Furthermore, we add some occluded noise into the learning process to enhance the robustness of dealing with background noise and partial occlusions. A real-life surveillance video data containing over 10 million objects are built to evaluate the proposed approach. Experimental results show our approach significantly outperforms state-of-the-art solutions for object retrieval in large-scale video dataset.
Please use this identifier to cite or link to this item: