Joint Action Recognition and Summarization by Sub-modular Inference

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016, pp. 2697 - 2701
Issue Date:
2016-03-25
Full metadata record
Files in This Item:
Filename Description Size
ICASSP_cameraready.pdfAccepted Manuscript version451.38 kB
Adobe PDF
Action recognition and video summarization are two important multimedia tasks that are useful for applications such as video indexing and retrieval, video surveillance, human-computer interaction and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this paper we move from the assumption that these two tasks should be tackled as a joint objective: on the one hand, action recognition can drive the selection of meaningful and informative summaries; on the other, recognizing actions from a summary rather than the entire video can in principle reduce noise and prove more accurate. To this aim, we propose a novel approach for joint action recognition-summarization based on the performing latent structural SVM framework, together with an efficient algorithm for inferring the action and the summary based on the property of sub-modularity. Experimental results on a challenging benchmark, MSR Dai-lyActivity3D, show that the approach is capable of achieving remarkable action recognition accuracy while providing appealing video summaries.
Please use this identifier to cite or link to this item: