Sequential deep learning for action recognition with synthetic multi-view data from depth maps

Liang, B; Zheng, L; Li, X

Sequential deep learning for action recognition with synthetic multi-view data from depth maps

Liang, B

Zheng, L Li, X

Permalink

Publisher:: Springer Singapore
Publication Type:: Conference Proceeding
Citation:: Communications in Computer and Information Science, 2019, 996, pp. 360-371
Issue Date:: 2019-01-01

Closed Access

	Filename	Description	Size
	Liang2019_Chapter_SequentialDeepLearningForActio.pdf	Published version	1.75 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liang, B https://orcid.org/0000-0002-6605-2167
dc.contributor.author	Zheng, L
dc.contributor.author	Li, X
dc.date.accessioned	2020-06-20T20:38:09Z
dc.date.available	2020-06-20T20:38:09Z
dc.date.issued	2019-01-01
dc.identifier.citation	Communications in Computer and Information Science, 2019, 996, pp. 360-371
dc.identifier.isbn	9789811366604
dc.identifier.issn	1865-0929
dc.identifier.issn	1865-0937
dc.identifier.uri	http://hdl.handle.net/10453/141586
dc.description.abstract	© Springer Nature Singapore Pte Ltd. 2019. Recurrent neural network (RNN) has proven successful recently in action recognition. However, depth sequences are of high dimensionality and contain rich human dynamics, which makes traditional RNNs difficult to capture complex action information. This paper addresses the problem of human action recognition from sequences of depth maps using sequential deep learning. The proposed method first synthesizes multi-view depth sequences by rotating 3D point clouds from depth maps. Each depth sequence is then split into short-term temporal segments. For each segment, a multi-view depth motion template (MVDMT), which compresses the segment to a motion template, is constructed for short-term multi-view action representation. The MVDMT effectively characterizes the multi-view appearance and motion patterns within a short-term duration. Convolutional Neural Network (CNN) models are leveraged to extract features from MVDMT, and a CNN-RNN network is subsequently employed to learn an effective representation for sequential patterns of the multi-view depth sequence. The proposed multi-view sequential deep learning framework can simultaneously capture spatial-temporal appearance and motion features in the depth sequence. The proposed method has been evaluated on the MSR Action3D and MSR Action Pairs datasets, achieving promising results compared with the state-of-the-art methods based on depth data.
dc.language	en
dc.publisher	Springer Singapore
dc.relation.ispartof	Communications in Computer and Information Science
dc.relation.isbasedon	10.1007/978-981-13-6661-1_28
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Sequential deep learning for action recognition with synthetic multi-view data from depth maps
dc.type	Conference Proceeding
utslib.citation.volume	996
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/A/DRsch The Data Science Institute
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2020-06-20T20:38:06Z
pubs.publication-status	Published
pubs.volume	996

Abstract:

© Springer Nature Singapore Pte Ltd. 2019. Recurrent neural network (RNN) has proven successful recently in action recognition. However, depth sequences are of high dimensionality and contain rich human dynamics, which makes traditional RNNs difficult to capture complex action information. This paper addresses the problem of human action recognition from sequences of depth maps using sequential deep learning. The proposed method first synthesizes multi-view depth sequences by rotating 3D point clouds from depth maps. Each depth sequence is then split into short-term temporal segments. For each segment, a multi-view depth motion template (MVDMT), which compresses the segment to a motion template, is constructed for short-term multi-view action representation. The MVDMT effectively characterizes the multi-view appearance and motion patterns within a short-term duration. Convolutional Neural Network (CNN) models are leveraged to extract features from MVDMT, and a CNN-RNN network is subsequently employed to learn an effective representation for sequential patterns of the multi-view depth sequence. The proposed multi-view sequential deep learning framework can simultaneously capture spatial-temporal appearance and motion features in the depth sequence. The proposed method has been evaluated on the MSR Action3D and MSR Action Pairs datasets, achieving promising results compared with the state-of-the-art methods based on depth data.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/141586