Static action recognition by efficient greedy inference

Publication Type:
Conference Proceeding
2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, 2016
Issue Date:
Filename Description Size
SAR_v4.pdfAccepted Manuscript version985.22 kB
Adobe PDF
Full metadata record
© 2016 IEEE. Action recognition from a single image is an important task for applications such as image annotation, robotic navigation, video surveillance and several others. Existing methods for recognizing actions from still images mainly rely on either bag-of-feature representations or pose estimation from articulated body-part models. However, the relationship between the action and the containing image is still substantially unexplored. Actually, the presence of given objects or specific backgrounds is likely to provide informative clues for the recognition of the action. For this reason, in this paper we propose approaching action recognition by first partitioning the entire image into superpixels, and then using their latent classes as attributes of the action. The action class is predicted based on a graphical model composed of measurements from each superpixel and a fully-connected graph of superpixel classes. The model is learned using a latent structural SVM approach, and an efficient, greedy algorithm is proposed to provide inference over the graph. Differently from most existing methods, the proposed approach does not require annotation of the actor (usually provided as a bounding box). Experimental results over the challenging Stanford 40 Action dataset have reported an impressive mean average precision of 72.3%, the highest achieved to date.
Please use this identifier to cite or link to this item: