Static Action Recognition by Efficient Greedy Inference

Publisher:
IEEE
Publication Type:
Conference Proceeding
Citation:
Proceedings of the 2016 IEEE Winter Conference on Applications of Computer Vision, 2016, pp. 1 - 8
Issue Date:
2016-03-09
Full metadata record
Files in This Item:
Filename Description Size
SAR_v4.pdfAccepted Manuscript version985.22 kB
Adobe PDF
Notification - 2nd round.txtAccepted Manuscript version463 B
Text
Reviews - 2nd round.txtAccepted Manuscript version5.46 kB
Text
Action recognition from a single image is an important task for applications such as image annotation, robotic navigation, video surveillance and several others. Existing methods for recognizing actions from still images mainly rely on either bag-of-feature representations or pose estimation from articulated body-part models. However, the relationship between the action and the containing image is still substantially unexplored. Actually, the presence of given objects or specific backgrounds is likely to provide informative clues for the recognition of the action. For this reason, in this paper we propose approaching action recognition by first partitioning the entire image into superpixels, and then using their latent classes as attributes of the action. The action class is predicted based on a graphical model composed of measurements from each superpixel and a fully-connected graph of superpixel classes. The model is learned using a latent structural SVM approach, and an efficient, greedy algorithm is proposed to provide inference over the graph. Differently from most existing methods, the proposed approach does not require annotation of the actor (usually provided as a bounding box). Experimental results over the challenging Stanford 40 Action dataset have reported an impressive mean average precision of 72.3%, the highest achieved to date.
Please use this identifier to cite or link to this item: