Multimedia event detection using a classifier-specific intermediate representation

Ma, Z; Yang, Y; Sebe, N; Zheng, K; Hauptmann, AG

Multimedia event detection using a classifier-specific intermediate representation

Ma, Z Yang, Y

Sebe, N Zheng, K Hauptmann, AG

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Multimedia, 2013, 15 (7), pp. 1628 - 1637
Issue Date:: 2013-10-31

Closed Access

	Filename	Description	Size
	06522476.pdf	Published Version	2.06 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Ma, Z	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Sebe, N	en_US
dc.contributor.author	Zheng, K	en_US
dc.contributor.author	Hauptmann, AG	en_US
dc.date.issued	2013-10-31	en_US
dc.identifier.citation	IEEE Transactions on Multimedia, 2013, 15 (7), pp. 1628 - 1637	en_US
dc.identifier.issn	1520-9210	en_US
dc.identifier.uri	http://hdl.handle.net/10453/116403
dc.description.abstract	Multimedia event detection (MED) plays an important role in many applications such as video indexing and retrieval. Current event detection works mainly focus on sports and news event detection or abnormality detection in surveillance videos. Differently, our research aims to detect more complicated and generic events within a longer video sequence. In the past, researchers have proposed using intermediate concept classifiers with concept lexica to help understand the videos. Yet it is difficult to judge how many and what concepts would be sufficient for the particular video analysis task. Additionally, obtaining robust semantic concept classifiers requires a large number of positive training examples, which in turn has high human annotation cost. In this paper, we propose an approach that exploits the external concepts-based videos and event-based videos simultaneously to learn an intermediate representation from video features. Our algorithm integrates the classifier inference and latent intermediate representation into a joint framework. The joint optimization of the intermediate representation and the classifier makes them mutually beneficial and reciprocal. Effectively, the intermediate representation and the classifier are tightly correlated. The classifier dependent intermediate representation not only accurately reflects the task semantics but is also more suitable for the specific classifier. Thus we have created a discriminative semantic analysis framework based on a tightly coupled intermediate representation. Extensive experiments on multimedia event detection using real-world videos demonstrate the effectiveness of the proposed approach. © 2013 IEEE.	en_US
dc.relation.ispartof	IEEE Transactions on Multimedia	en_US
dc.relation.isbasedon	10.1109/TMM.2013.2264928	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Multimedia event detection using a classifier-specific intermediate representation	en_US
dc.type	Journal Article
utslib.citation.volume	7	en_US
utslib.citation.volume	15	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	7	en_US
pubs.publication-status	Published	en_US
pubs.volume	15	en_US

Abstract:

Multimedia event detection (MED) plays an important role in many applications such as video indexing and retrieval. Current event detection works mainly focus on sports and news event detection or abnormality detection in surveillance videos. Differently, our research aims to detect more complicated and generic events within a longer video sequence. In the past, researchers have proposed using intermediate concept classifiers with concept lexica to help understand the videos. Yet it is difficult to judge how many and what concepts would be sufficient for the particular video analysis task. Additionally, obtaining robust semantic concept classifiers requires a large number of positive training examples, which in turn has high human annotation cost. In this paper, we propose an approach that exploits the external concepts-based videos and event-based videos simultaneously to learn an intermediate representation from video features. Our algorithm integrates the classifier inference and latent intermediate representation into a joint framework. The joint optimization of the intermediate representation and the classifier makes them mutually beneficial and reciprocal. Effectively, the intermediate representation and the classifier are tightly correlated. The classifier dependent intermediate representation not only accurately reflects the task semantics but is also more suitable for the specific classifier. Thus we have created a discriminative semantic analysis framework based on a tightly coupled intermediate representation. Extensive experiments on multimedia event detection using real-world videos demonstrate the effectiveness of the proposed approach. © 2013 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/116403