Structured learning of local features for human action classification and localization

Thi, TH; Cheng, L; Zhang, J; Wang, L; Satoh, S

Structured learning of local features for human action classification and localization

Thi, TH Cheng, L Zhang, J

Wang, L Satoh, S

Permalink

Publication Type:: Journal Article
Citation:: Image and Vision Computing, 2012, 30 (1), pp. 1 - 14
Issue Date:: 2012-01-01

Closed Access

	Filename	Description	Size
	2012000993OK.pdf	Published Version	4.25 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Thi, TH	en_US
dc.contributor.author	Cheng, L	en_US
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541	en_US
dc.contributor.author	Wang, L	en_US
dc.contributor.author	Satoh, S	en_US
dc.date.issued	2012-01-01	en_US
dc.identifier.citation	Image and Vision Computing, 2012, 30 (1), pp. 1 - 14	en_US
dc.identifier.issn	0262-8856	en_US
dc.identifier.uri	http://hdl.handle.net/10453/31599
dc.description.abstract	Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods. © 2011 Elsevier B.V. All rights reserved.	en_US
dc.relation.ispartof	Image and Vision Computing	en_US
dc.relation.isbasedon	10.1016/j.imavis.2011.12.006	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Structured learning of local features for human action classification and localization	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	30	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	closed_access
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	30	en_US

Abstract:

Human action recognition is a promising yet non-trivial computer vision field with many potential applications. Current advances in bag-of-feature approaches have brought significant insights into recognizing human actions within complex context. It is, however, a common practice in literature to consider action as merely an orderless set of local salient features. This representation has been shown to be oversimplified, which inherently limits traditional approaches from robust deployment in real-life scenarios. In this work, we propose and show that, by taking into account global configuration of local features, we can greatly improve recognition performance. We first introduce a novel feature selection process called Sparse Hierarchical Bayes Filter to select only the most contributive features of each action type based on neighboring structure constraints. We then present the application of structured learning in human action analysis. That is, by representing human action as a complex set of local features, we can incorporate different spatial and temporal feature constraints into the learning tasks of human action classification and localization. In particular, we tackle the problem of action localization in video using structured learning with two alternatives: one is Dynamic Conditional Random Field from probabilistic perspective; the other is Structural Support Vector Machine from max-margin point of view. We evaluate our modular classification-localization framework on various testbeds, in which our proposed framework is proven to be highly effective and robust compared against bag-of-feature methods. © 2011 Elsevier B.V. All rights reserved.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/31599