Static action recognition by efficient greedy inference

Abidi, S; Piccardi, M; Williams, MA

Static action recognition by efficient greedy inference

Abidi, S Piccardi, M

Williams, MA

Permalink

Publication Type:: Conference Proceeding
Citation:: 2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, 2016
Issue Date:: 2016-05-23

Closed Access

	Filename	Description	Size
	SAR_v4.pdf	Accepted Manuscript version	985.22 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Abidi, S	en_US
dc.contributor.author	Piccardi, M https://orcid.org/0000-0001-9250-6604	en_US
dc.contributor.author	Williams, MA https://orcid.org/0000-0002-1047-0503	en_US
dc.date.available	2016-01-20	en_US
dc.date.issued	2016-05-23	en_US
dc.identifier.citation	2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016, 2016	en_US
dc.identifier.isbn	9781509006410	en_US
dc.identifier.uri	http://hdl.handle.net/10453/43542
dc.description.abstract	© 2016 IEEE. Action recognition from a single image is an important task for applications such as image annotation, robotic navigation, video surveillance and several others. Existing methods for recognizing actions from still images mainly rely on either bag-of-feature representations or pose estimation from articulated body-part models. However, the relationship between the action and the containing image is still substantially unexplored. Actually, the presence of given objects or specific backgrounds is likely to provide informative clues for the recognition of the action. For this reason, in this paper we propose approaching action recognition by first partitioning the entire image into superpixels, and then using their latent classes as attributes of the action. The action class is predicted based on a graphical model composed of measurements from each superpixel and a fully-connected graph of superpixel classes. The model is learned using a latent structural SVM approach, and an efficient, greedy algorithm is proposed to provide inference over the graph. Differently from most existing methods, the proposed approach does not require annotation of the actor (usually provided as a bounding box). Experimental results over the challenging Stanford 40 Action dataset have reported an impressive mean average precision of 72.3%, the highest achieved to date.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP120102876
dc.relation.ispartof	2016 IEEE Winter Conference on Applications of Computer Vision, WACV 2016	en_US
dc.relation.isbasedon	10.1109/WACV.2016.7477686	en_US
dc.title	Static action recognition by efficient greedy inference	en_US
dc.type	Conference Proceeding
utslib.for	080104 Computer Vision	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Arts and Social Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

© 2016 IEEE. Action recognition from a single image is an important task for applications such as image annotation, robotic navigation, video surveillance and several others. Existing methods for recognizing actions from still images mainly rely on either bag-of-feature representations or pose estimation from articulated body-part models. However, the relationship between the action and the containing image is still substantially unexplored. Actually, the presence of given objects or specific backgrounds is likely to provide informative clues for the recognition of the action. For this reason, in this paper we propose approaching action recognition by first partitioning the entire image into superpixels, and then using their latent classes as attributes of the action. The action class is predicted based on a graphical model composed of measurements from each superpixel and a fully-connected graph of superpixel classes. The model is learned using a latent structural SVM approach, and an efficient, greedy algorithm is proposed to provide inference over the graph. Differently from most existing methods, the proposed approach does not require annotation of the actor (usually provided as a bounding box). Experimental results over the challenging Stanford 40 Action dataset have reported an impressive mean average precision of 72.3%, the highest achieved to date.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/43542