Deep appearance and motion learning for egocentric activity recognition

Wang, X; Gao, L; Song, J; Zhen, X; Sebe, N; Shen, HT

Deep appearance and motion learning for egocentric activity recognition

Wang, X Gao, L Song, J Zhen, X Sebe, N Shen, HT

Permalink

Publication Type:: Journal Article
Citation:: Neurocomputing, 2018, 275 pp. 438 - 447
Issue Date:: 2018-01-31

Closed Access

	Filename	Description	Size
	1-s2.0-S0925231217314935-main.pdf	Published Version	1.9 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Wang, X	en_US
dc.contributor.author	Gao, L	en_US
dc.contributor.author	Song, J	en_US
dc.contributor.author	Zhen, X	en_US
dc.contributor.author	Sebe, N	en_US
dc.contributor.author	Shen, HT	en_US
dc.date.issued	2018-01-31	en_US
dc.identifier.citation	Neurocomputing, 2018, 275 pp. 438 - 447	en_US
dc.identifier.issn	0925-2312	en_US
dc.identifier.uri	http://hdl.handle.net/10453/131369
dc.description.abstract	© 2017 Elsevier B.V. Egocentric activity recognition has recently generated great popularity in computer vision due to its widespread applications in egocentric video analysis. However, it poses new challenges comparing to the conventional third-person activity recognition tasks, which are caused by significant body shaking, varied lengths, and poor recoding quality, etc. To handle these challenges, in this paper, we propose deep appearance and motion learning (DAML) for egocentric activity recognition, which leverages the great strength of deep learning networks in feature learning. In contrast to hand-crafted visual features or pre-trained convolutional neural network (CNN) features with limited generality to new egocentric videos, the proposed DAML is built on the deep autoencoder (DAE), and directly extracts appearance and motion feature, the main cue of activities, from egocentric videos. The DAML takes advantages of the great effectiveness and efficiency of the DAE in unsupervised feature learning, which provides a new representation learning framework of egocentric videos. The learned appearance and motion features by the DAML are seamlessly fused to accomplish a rich informative egocentric activity representation which can be readily fed into any supervised learning models for activity recognition. Experimental results on two challenging benchmark datasets show that the DAML achieves high performance on both short- and long-term egocentric activity recognition tasks, which is comparable to or even better than the state-of-the-art counterparts.	en_US
dc.relation.ispartof	Neurocomputing	en_US
dc.relation.isbasedon	10.1016/j.neucom.2017.08.063	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Deep appearance and motion learning for egocentric activity recognition	en_US
dc.type	Journal Article
utslib.citation.volume	275	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
utslib.for	17 Psychology and Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Software
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	275	en_US

Abstract:

© 2017 Elsevier B.V. Egocentric activity recognition has recently generated great popularity in computer vision due to its widespread applications in egocentric video analysis. However, it poses new challenges comparing to the conventional third-person activity recognition tasks, which are caused by significant body shaking, varied lengths, and poor recoding quality, etc. To handle these challenges, in this paper, we propose deep appearance and motion learning (DAML) for egocentric activity recognition, which leverages the great strength of deep learning networks in feature learning. In contrast to hand-crafted visual features or pre-trained convolutional neural network (CNN) features with limited generality to new egocentric videos, the proposed DAML is built on the deep autoencoder (DAE), and directly extracts appearance and motion feature, the main cue of activities, from egocentric videos. The DAML takes advantages of the great effectiveness and efficiency of the DAE in unsupervised feature learning, which provides a new representation learning framework of egocentric videos. The learned appearance and motion features by the DAML are seamlessly fused to accomplish a rich informative egocentric activity representation which can be readily fed into any supervised learning models for activity recognition. Experimental results on two challenging benchmark datasets show that the DAML achieves high performance on both short- and long-term egocentric activity recognition tasks, which is comparable to or even better than the state-of-the-art counterparts.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/131369