Scalable video event retrieval by visual state binary embedding

Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Journal Article
Citation:
IEEE Transactions on Multimedia, 2016, 18, (8), pp. 1590-1603
Issue Date:
2016-08-01
Filename Description Size
Scalable_Video_Event_Retrieval_by_Visual_State_Binary_Embedding.pdfPublished version1.56 MB
Adobe PDF
Full metadata record
With the exponential increase of media data on the web, fast media retrieval is becoming a significant research topic in multimedia content analysis. Among the variety of techniques, learning binary embedding (hashing) functions is one of the most popular approaches that can achieve scalable information retrieval in large databases, and it is mainly used in the near-duplicate multimedia search. However, till now most hashing methods are specifically designed for near-duplicate retrieval at the visual level rather than the semantic level. In this paper, we propose a visual state binary embedding (VSBE) model to encode the video frames, which can preserve the essential semantic information in binary matrices, to facilitate fast video event retrieval in unconstrained cases. Compared with other video binary embedding models, one advantage of our proposed VSBE model is that it only needs a limited number of key frames from the training videos for hash function training, so the computational complexity is much lower in the training phase. At the same time, we apply the pairwise constraints generated from the visual states to sketch the local properties of the events at the semantic level, so accuracy is also ensured. We conducted extensive experiments on the challenging TRECVID MED dataset, and have proved the superiority of our proposed VSBE model.
Please use this identifier to cite or link to this item: