Embedding motion and structure features for action recognition

Zhen, X; Shao, L; Tao, D; Li, X

Embedding motion and structure features for action recognition

Zhen, X Shao, L Tao, D

Li, X

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23 (7), pp. 1182 - 1190
Issue Date:: 2013-07-01

Closed Access

	Filename	Description	Size
	2013001083OK.pdf		1.81 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhen, X	en_US
dc.contributor.author	Shao, L	en_US
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449	en_US
dc.contributor.author	Li, X	en_US
dc.date.issued	2013-07-01	en_US
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2013, 23 (7), pp. 1182 - 1190	en_US
dc.identifier.issn	1051-8215	en_US
dc.identifier.uri	http://hdl.handle.net/10453/27175
dc.description.abstract	We propose a novel method to model human actions by explicitly coding motion and structure features that are separately extracted from video sequences. Firstly, the motion template (one feature map) is applied to encode the motion information and image planes (five feature maps) are extracted from the volume of differences of frames to capture the structure information. The Gaussian pyramid and center-surround operations are performed on each of the six obtained feature maps, decomposing each feature map into a set of subband maps. Biologically inspired features are then extracted by successively applying Gabor filtering and max pooling on each subband map. To make a compact representation, discriminative locality alignment is employed to embed the high-dimensional features into a low-dimensional manifold space. In contrast to sparse representations based on detected interest points, which suffer from the loss of structure information, the proposed model takes into account the motion and structure information simultaneously and integrates them in a unified framework; it therefore provides an informative and compact representation of human actions. The proposed method is evaluated on the KTH, the multiview IXMAS, and the challenging UCF sports datasets and outperforms stateof-the-art techniques on action recognition.©2013 IEEE.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP120103730
dc.relation.ispartof	IEEE Transactions on Circuits and Systems for Video Technology	en_US
dc.relation.isbasedon	10.1109/TCSVT.2013.2240916	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Embedding motion and structure features for action recognition	en_US
dc.type	Journal Article
utslib.citation.volume	7	en_US
utslib.citation.volume	23	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access
pubs.issue	7	en_US
pubs.publication-status	Published	en_US
pubs.volume	23	en_US

Abstract:

We propose a novel method to model human actions by explicitly coding motion and structure features that are separately extracted from video sequences. Firstly, the motion template (one feature map) is applied to encode the motion information and image planes (five feature maps) are extracted from the volume of differences of frames to capture the structure information. The Gaussian pyramid and center-surround operations are performed on each of the six obtained feature maps, decomposing each feature map into a set of subband maps. Biologically inspired features are then extracted by successively applying Gabor filtering and max pooling on each subband map. To make a compact representation, discriminative locality alignment is employed to embed the high-dimensional features into a low-dimensional manifold space. In contrast to sparse representations based on detected interest points, which suffer from the loss of structure information, the proposed model takes into account the motion and structure information simultaneously and integrates them in a unified framework; it therefore provides an informative and compact representation of human actions. The proposed method is evaluated on the KTH, the multiview IXMAS, and the challenging UCF sports datasets and outperforms stateof-the-art techniques on action recognition.©2013 IEEE.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/27175