Sparse coding-based spatiotemporal saliency for action recognition

Publication Type:
Conference Proceeding
Proceedings - International Conference on Image Processing, ICIP, 2015, 2015-December pp. 2045 - 2049
Issue Date:
Full metadata record
© 2015 IEEE. In this paper, we address the problem of human action recognition by representing image sequences as a sparse collection of patch-level spatiotemporal events that are salient in both space and time domain. Our method uses a multi-scale volumetric representation of video and adaptively selects an optimal space-time scale under which the saliency of a patch is most significant. The input image sequences are first partitioned into non-overlapping patches. Then, each patch is represented by a vector of coefficients that can linearly reconstruct the patch from a learned dictionary of basis patches. We propose to measure the spatiotemporal saliency of patches using Shannon's self-information entropy, where a patch's saliency is determined by information variation in the contents of the patch's spatiotemporal neighborhood. Experimental results on two benchmark datasets demonstrate the effectiveness of our proposed method.
Please use this identifier to cite or link to this item: