Joint action recognition and summarization by sub-modular inference

Hussein, F; Awwad, S; Piccardi, M

Joint action recognition and summarization by sub-modular inference

Hussein, F Awwad, S Piccardi, M

Permalink

Publication Type:: Conference Proceeding
Citation:: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2016, 2016-May pp. 2697 - 2701
Issue Date:: 2016-05-18

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript versionAdobe PDF (451.38 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Hussein, F	en_US
dc.contributor.author	Awwad, S	en_US
dc.contributor.author	Piccardi, M https://orcid.org/0000-0001-9250-6604	en_US
dc.date.available	2015-12-22	en_US
dc.date.issued	2016-05-18	en_US
dc.identifier.citation	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2016, 2016-May pp. 2697 - 2701	en_US
dc.identifier.isbn	9781479999880	en_US
dc.identifier.issn	1520-6149	en_US
dc.identifier.uri	http://hdl.handle.net/10453/43541
dc.description.abstract	© 2016 IEEE. Action recognition and video summarization are two important multimedia tasks that are useful for applications such as video indexing and retrieval, video surveillance, humancomputer interaction and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this paper we move from the assumption that these two tasks should be tackled as a joint objective: on the one hand, action recognition can drive the selection of meaningful and informative summaries; on the other, recognizing actions from a summary rather than the entire video can in principle reduce noise and prove more accurate. To this aim, we propose a novel approach for joint action recognition-summarization based on the performing latent structural SVM framework, together with an efficient algorithm for inferring the action and the summary based on the property of sub-modularity. Experimental results on a challenging benchmark, MSR Dai-lyActivity3D, show that the approach is capable of achieving remarkable action recognition accuracy while providing appealing video summaries.	en_US
dc.relation.ispartof	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings	en_US
dc.relation.isbasedon	10.1109/ICASSP.2016.7472167	en_US
dc.title	Joint action recognition and summarization by sub-modular inference	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2016-May	en_US
utslib.for	080104 Computer Vision	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computing and Communications
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2016-May	en_US

Abstract:

© 2016 IEEE. Action recognition and video summarization are two important multimedia tasks that are useful for applications such as video indexing and retrieval, video surveillance, humancomputer interaction and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this paper we move from the assumption that these two tasks should be tackled as a joint objective: on the one hand, action recognition can drive the selection of meaningful and informative summaries; on the other, recognizing actions from a summary rather than the entire video can in principle reduce noise and prove more accurate. To this aim, we propose a novel approach for joint action recognition-summarization based on the performing latent structural SVM framework, together with an efficient algorithm for inferring the action and the summary based on the property of sub-modularity. Experimental results on a challenging benchmark, MSR Dai-lyActivity3D, show that the approach is capable of achieving remarkable action recognition accuracy while providing appealing video summaries.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/43541