Joint attributes and event analysis for multimedia event detection

Ma, Z; Chang, X; Xu, Z; Sebe, N; Hauptmann, AG

Joint attributes and event analysis for multimedia event detection

Ma, Z Chang, X

Xu, Z Sebe, N Hauptmann, AG

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Neural Networks and Learning Systems, 2018, 29 (7), pp. 2921 - 2930
Issue Date:: 2018-07-01

Closed Access

	Filename	Description	Size
	07949100.pdf	Published Version	2.25 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Ma, Z	en_US
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807	en_US
dc.contributor.author	Xu, Z	en_US
dc.contributor.author	Sebe, N	en_US
dc.contributor.author	Hauptmann, AG	en_US
dc.date.issued	2018-07-01	en_US
dc.identifier.citation	IEEE Transactions on Neural Networks and Learning Systems, 2018, 29 (7), pp. 2921 - 2930	en_US
dc.identifier.issn	2162-237X	en_US
dc.identifier.uri	http://hdl.handle.net/10453/125694
dc.description.abstract	© 2012 IEEE. Semantic attributes have been increasingly used the past few years for multimedia event detection (MED) with promising results. The motivation is that multimedia events generally consist of lower level components such as objects, scenes, and actions. By characterizing multimedia event videos with semantic attributes, one could exploit more informative cues for improved detection results. Much existing work obtains semantic attributes from images, which may be suboptimal for video analysis since these image-inferred attributes do not carry dynamic information that is essential for videos. To address this issue, we propose to learn semantic attributes from external videos using their semantic labels. We name them video attributes in this paper. In contrast with multimedia event videos, these external videos depict lower level contents such as objects, scenes, and actions. To harness video attributes, we propose an algorithm established on a correlation vector that correlates them to a target event. Consequently, we could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework. To validate our method, we perform experiments on the real-world large-scale TRECVID MED 2013 and 2014 data sets and compare our method with several state-of-the-art algorithms. The experiments show that our method is advantageous for MED.	en_US
dc.relation.ispartof	IEEE Transactions on Neural Networks and Learning Systems	en_US
dc.relation.isbasedon	10.1109/TNNLS.2017.2709308	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Joint attributes and event analysis for multimedia event detection	en_US
dc.type	Journal Article
utslib.citation.volume	7	en_US
utslib.citation.volume	29	en_US
utslib.for	0805 Distributed Computing	en_US
utslib.for	0803 Computer Software	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.issue	7	en_US
pubs.publication-status	Published	en_US
pubs.volume	29	en_US

Abstract:

© 2012 IEEE. Semantic attributes have been increasingly used the past few years for multimedia event detection (MED) with promising results. The motivation is that multimedia events generally consist of lower level components such as objects, scenes, and actions. By characterizing multimedia event videos with semantic attributes, one could exploit more informative cues for improved detection results. Much existing work obtains semantic attributes from images, which may be suboptimal for video analysis since these image-inferred attributes do not carry dynamic information that is essential for videos. To address this issue, we propose to learn semantic attributes from external videos using their semantic labels. We name them video attributes in this paper. In contrast with multimedia event videos, these external videos depict lower level contents such as objects, scenes, and actions. To harness video attributes, we propose an algorithm established on a correlation vector that correlates them to a target event. Consequently, we could incorporate video attributes latently as extra information into the event detector learnt from multimedia event videos in a joint framework. To validate our method, we perform experiments on the real-world large-scale TRECVID MED 2013 and 2014 data sets and compare our method with several state-of-the-art algorithms. The experiments show that our method is advantageous for MED.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/125694