Joint attributes and event analysis for multimedia event detection

Publication Type:
Journal Article
IEEE Transactions on Neural Networks and Learning Systems, 2018, 29 (7), pp. 2921 - 2930
Issue Date:
Filename Description Size
07949100.pdfPublished Version2.25 MB
Adobe PDF
Full metadata record
© 2012 IEEE. Semantic attributes have been increasingly used the past few years for multimedia event detection (MED) with promising results. The motivation is that multimedia events generally consist of lower level components such as objects, scenes, and actions. By characterizing multimedia event videos with semantic attributes, one could exploit more informative cues for improved detection results. Much existing work obtains semantic attributes from images, which may be suboptimal for video analysis since these image-inferred attributes do not carry dynamic information that is essential for videos. To address this issue, we propose to learn semantic attributes from external videos using their semantic labels. We name them video attributes in this paper. In contrast with multimedia event videos, these external videos depict lower level contents such as objects, scenes, and actions. To harness video attributes, we propose an algorithm established on a correlation vector that correlates them to a target event. Consequently, we could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework. To validate our method, we perform experiments on the real-world large-scale TRECVID MED 2013 and 2014 data sets and compare our method with several state-of-the-art algorithms. The experiments show that our method is advantageous for MED.
Please use this identifier to cite or link to this item: