Joint action recognition and summarization by sub-modular inference

Publication Type:
Conference Proceeding
Citation:
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2016, 2016-May pp. 2697 - 2701
Issue Date:
2016-05-18
Full metadata record
© 2016 IEEE. Action recognition and video summarization are two important multimedia tasks that are useful for applications such as video indexing and retrieval, video surveillance, humancomputer interaction and home intelligence. While many approaches exist in the literature for these two tasks, to date they have always been addressed separately. Instead, in this paper we move from the assumption that these two tasks should be tackled as a joint objective: on the one hand, action recognition can drive the selection of meaningful and informative summaries; on the other, recognizing actions from a summary rather than the entire video can in principle reduce noise and prove more accurate. To this aim, we propose a novel approach for joint action recognition-summarization based on the performing latent structural SVM framework, together with an efficient algorithm for inferring the action and the summary based on the property of sub-modularity. Experimental results on a challenging benchmark, MSR Dai-lyActivity3D, show that the approach is capable of achieving remarkable action recognition accuracy while providing appealing video summaries.
Please use this identifier to cite or link to this item: