We are not equally negative: Fine-grained labeling for multimedia event detection

Ma, Z; Yang, Y; Xu, Z; Sebe, N; Hauptmann, AG

We are not equally negative: Fine-grained labeling for multimedia event detection

Ma, Z Yang, Y

Xu, Z Sebe, N Hauptmann, AG

Permalink

Publication Type:: Conference Proceeding
Citation:: MM 2013 - Proceedings of the 2013 ACM Multimedia Conference, 2013, pp. 293 - 302
Issue Date:: 2013-11-18

Closed Access

	Filename	Description	Size
	wearenot.pdf	Published version	1.71 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Ma, Z	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Xu, Z	en_US
dc.contributor.author	Sebe, N	en_US
dc.contributor.author	Hauptmann, AG	en_US
dc.date.issued	2013-11-18	en_US
dc.identifier.citation	MM 2013 - Proceedings of the 2013 ACM Multimedia Conference, 2013, pp. 293 - 302	en_US
dc.identifier.isbn	9781450324045	en_US
dc.identifier.uri	http://hdl.handle.net/10453/120169
dc.description.abstract	Multimedia event detection (MED) is an effective technique for video indexing and retrieval. Current classifier training for MED treats the negative videos equally. However, many negative videos may resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not accurate enough to characterize the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the knowledge from the visual features and the attributes representations, which brings mutual reciprocality. Our model obtains two kinds of classifiers, one from the attributes and one from the features, which incorporate the informative cues from the finegrained labels. The outputs of both classifiers on the testing videos are fused for detection. Extensive experiments on the challenging TRECVID MED 2012 development set have validated the efficacy of our proposed approach. Copyright © 2013 ACM.	en_US
dc.relation.ispartof	MM 2013 - Proceedings of the 2013 ACM Multimedia Conference	en_US
dc.relation.isbasedon	10.1145/2502081.2502119	en_US
dc.title	We are not equally negative: Fine-grained labeling for multimedia event detection	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Multimedia event detection (MED) is an effective technique for video indexing and retrieval. Current classifier training for MED treats the negative videos equally. However, many negative videos may resemble the positive videos in different degrees. Intuitively, we may capture more informative cues from the negative videos if we assign them fine-grained labels, thus benefiting the classifier learning. Aiming for this, we use a statistical method on both the positive and negative examples to get the decisive attributes of a specific event. Based on these decisive attributes, we assign the fine-grained labels to negative examples to treat them differently for more effective exploitation. The resulting fine-grained labels may be not accurate enough to characterize the negative videos. Hence, we propose to jointly optimize the fine-grained labels with the knowledge from the visual features and the attributes representations, which brings mutual reciprocality. Our model obtains two kinds of classifiers, one from the attributes and one from the features, which incorporate the informative cues from the finegrained labels. The outputs of both classifiers on the testing videos are fused for detection. Extensive experiments on the challenging TRECVID MED 2012 development set have validated the efficacy of our proposed approach. Copyright © 2013 ACM.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/120169