Recognizing human actions from low-resolution videos by region-based mixture models

Zhao, Y; Di, H; Zhang, J; Lu, Y; Lv, F

Recognizing human actions from low-resolution videos by region-based mixture models

Zhao, Y Di, H Zhang, J

Lu, Y Lv, F

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings - IEEE International Conference on Multimedia and Expo, 2016, 2016-August
Issue Date:: 2016-08-25

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted ManuscriptAdobe PDF (671.3 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhao, Y	en_US
dc.contributor.author	Di, H	en_US
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541	en_US
dc.contributor.author	Lu, Y	en_US
dc.contributor.author	Lv, F	en_US
dc.date.issued	2016-08-25	en_US
dc.identifier.citation	Proceedings - IEEE International Conference on Multimedia and Expo, 2016, 2016-August	en_US
dc.identifier.isbn	9781467372589	en_US
dc.identifier.issn	1945-7871	en_US
dc.identifier.uri	http://hdl.handle.net/10453/54553
dc.description.abstract	© 2016 IEEE. Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.	en_US
dc.relation.ispartof	Proceedings - IEEE International Conference on Multimedia and Expo	en_US
dc.relation.isbasedon	10.1109/ICME.2016.7552886	en_US
dc.title	Recognizing human actions from low-resolution videos by region-based mixture models	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2016-August	en_US
utslib.for	080101 Adaptive Agents and Intelligent Robotics	en_US
utslib.for	080110 Simulation and Modelling	en_US
utslib.for	080109 Pattern Recognition and Data Mining	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2016-August	en_US

Abstract:

© 2016 IEEE. Recognizing human action from low-resolution (LR) videos is essential for many applications including large-scale video surveillance, sports video analysis and intelligent aerial vehicles. Currently, state-of-the-art performance in action recognition is achieved by the use of dense trajectories which are extracted by optical flow algorithms. However, the optical flow algorithms are far from perfect in LR videos. In addition, the spatial and temporal layout of features is a powerful cue for action discrimination. While, most existing methods encode the layout by previously segmenting body parts which is not feasible in LR videos. Addressing the problems, we adopt the Layered Elastic Motion Tracking (LEMT) method to extract a set of long-term motion trajectories and a long-term common shape from each video sequence, where the extracted trajectories are much denser than those of sparse interest points(SIPs); then we present a hybrid feature representation to integrate both of the shape and motion features; and finally we propose a Region-based Mixture Model (RMM) to be utilized for action classification. The RMM models the spatial layout of features without any needs of body parts segmentation. Experiments are conducted on two publicly available LR human action datasets. Among which, the UT-Tower dataset is very challenging because the average height of human figures is only about 20 pixels. The proposed approach attains near-perfect accuracy on both of the datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/54553