Searching persuasively: Joint event detection and evidence recounting with limited supervision

Chang, X; Yu, YL; Yang, Y; Hauptmann, AG

Searching persuasively: Joint event detection and evidence recounting with limited supervision

Chang, X

Yu, YL Yang, Y

Hauptmann, AG

Permalink

Publication Type:: Conference Proceeding
Citation:: MM 2015 - Proceedings of the 2015 ACM Multimedia Conference, 2015, pp. 581 - 590
Issue Date:: 2015-10-13

Closed Access

	Filename	Description	Size
	s.pdf	Published version	1.68 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807	en_US
dc.contributor.author	Yu, YL	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Hauptmann, AG	en_US
dc.date.issued	2015-10-13	en_US
dc.identifier.citation	MM 2015 - Proceedings of the 2015 ACM Multimedia Conference, 2015, pp. 581 - 590	en_US
dc.identifier.isbn	9781450334594	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121767
dc.description.abstract	© 2015 ACM. Multimedia event detection (MED) and multimedia event recounting (MER) are fundamental tasks in managing large amounts of unconstrained web videos, and have attracted a lot of attention in recent years. Most existing systems perform MER as a postprocessing step on top of the MED results. In order to leverage the mutual benefits of the two tasks, we propose a joint framework that simultaneously detects high-level events and localizes the indicative concepts of the events. Our premise is that a good recounting algorithm should not only explain the detection result, but should also be able to assist detection in the first place. Coupled in a joint optimization framework, recounting improves detection by pruning irrelevant noisy concepts while detection directs recounting to the most discriminative evidences. To better utilize the powerful and interpretable semantic video representation, we segment each video into several shots and exploit the rich temporal structures at shot level. The consequent computational challenge is carefully addressed through a significant improvement of the current ADMM algorithm, which, after eliminating all inner loops and equipping novel closed-form solutions for all intermediate steps, enables us to efficiently process extremely large video corpora. We test the proposed method on the large scale TRECVID MEDTest 2014 and MEDTest 2013 datasets, and obtain very promising results for both MED and MER.	en_US
dc.relation.ispartof	MM 2015 - Proceedings of the 2015 ACM Multimedia Conference	en_US
dc.relation.isbasedon	10.1145/2733373.2806218	en_US
dc.title	Searching persuasively: Joint event detection and evidence recounting with limited supervision	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2015 ACM. Multimedia event detection (MED) and multimedia event recounting (MER) are fundamental tasks in managing large amounts of unconstrained web videos, and have attracted a lot of attention in recent years. Most existing systems perform MER as a postprocessing step on top of the MED results. In order to leverage the mutual benefits of the two tasks, we propose a joint framework that simultaneously detects high-level events and localizes the indicative concepts of the events. Our premise is that a good recounting algorithm should not only explain the detection result, but should also be able to assist detection in the first place. Coupled in a joint optimization framework, recounting improves detection by pruning irrelevant noisy concepts while detection directs recounting to the most discriminative evidences. To better utilize the powerful and interpretable semantic video representation, we segment each video into several shots and exploit the rich temporal structures at shot level. The consequent computational challenge is carefully addressed through a significant improvement of the current ADMM algorithm, which, after eliminating all inner loops and equipping novel closed-form solutions for all intermediate steps, enables us to efficiently process extremely large video corpora. We test the proposed method on the large scale TRECVID MEDTest 2014 and MEDTest 2013 datasets, and obtain very promising results for both MED and MER.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121767