Integrating local action elements for action analysis

Thi, TH; Cheng, L; Zhang, J; Wang, L; Satoh, S

Integrating local action elements for action analysis

Thi, TH Cheng, L Zhang, J

Wang, L

Satoh, S

Permalink

Publication Type:: Journal Article
Citation:: Computer Vision and Image Understanding, 2012, 116 (3), pp. 378 - 395
Issue Date:: 2012-03-01

Closed Access

	Filename	Description	Size
	2012000987OK.pdf		4.51 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Thi, TH	en_US
dc.contributor.author	Cheng, L	en_US
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541	en_US
dc.contributor.author	Wang, L https://orcid.org/0000-0002-5079-8992	en_US
dc.contributor.author	Satoh, S	en_US
dc.date.issued	2012-03-01	en_US
dc.identifier.citation	Computer Vision and Image Understanding, 2012, 116 (3), pp. 378 - 395	en_US
dc.identifier.issn	1077-3142	en_US
dc.identifier.uri	http://hdl.handle.net/10453/22914
dc.description.abstract	In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatial-temporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets. © 2011 Elsevier Inc. All rights reserved.	en_US
dc.relation.ispartof	Computer Vision and Image Understanding	en_US
dc.relation.isbasedon	10.1016/j.cviu.2011.09.007	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Integrating local action elements for action analysis	en_US
dc.type	Journal Article
utslib.citation.volume	3	en_US
utslib.citation.volume	116	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	1702 Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	closed_access
pubs.issue	3	en_US
pubs.publication-status	Published	en_US
pubs.volume	116	en_US

Abstract:

In this paper, we propose a framework for human action analysis from video footage. A video action sequence in our perspective is a dynamic structure of sparse local spatial-temporal patches termed action elements, so the problems of action analysis in video are carried out here based on the set of local characteristics as well as global shape of a prescribed action. We first detect a set of action elements that are the most compact entities of an action, then we extend the idea of Implicit Shape Model to space time, in order to properly integrate the spatial and temporal properties of these action elements. In particular, we consider two different recipes to construct action elements: one is to use a Sparse Bayesian Feature Classifier to choose action elements from all detected Spatial Temporal Interest Points, and is termed discriminative action elements. The other one detects affine invariant local features from the holistic Motion History Images, and picks up action elements according to their compactness scores, and is called generative action elements. Action elements detected from either way are then used to construct a voting space based on their local feature representations as well as their global configuration constraints. Our approach is evaluated in the two main contexts of current human action analysis challenges, action retrieval and action classification. Comprehensive experimental results show that our proposed framework marginally outperforms all existing state-of-the-arts techniques on a range of different datasets. © 2011 Elsevier Inc. All rights reserved.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/22914