PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning

Guan, S; Lu, H; Zhu, L; Fang, G

PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning

Guan, S Lu, H

Zhu, L Fang, G

Permalink

Publisher:: Springer International Publishing
Publication Type:: Chapter
Citation:: Neural Information Processing, 2021, 1517 CCIS, pp. 266-274
Issue Date:: 2021-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 1 Jan 2023

Adobe PDF

Download Accepted versionAdobe PDF (1.87 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Guan, S
dc.contributor.author	Lu, H https://orcid.org/0000-0001-5655-0237
dc.contributor.author	Zhu, L
dc.contributor.author	Fang, G https://orcid.org/0000-0003-0845-6718
dc.date.accessioned	2022-02-04T04:13:16Z
dc.date.available	2022-02-04T04:13:16Z
dc.date.issued	2021-01-01
dc.identifier.citation	Neural Information Processing, 2021, 1517 CCIS, pp. 266-274
dc.identifier.isbn	9783030923099
dc.identifier.uri	http://hdl.handle.net/10453/154191
dc.description.abstract	Weakly supervised learning for 3D human pose estimation can learn a real human structure, but it generally has lower accuracy on reconstructing 3D poses. In this work, we present a 3D pose estimation model using a Transformer encoder based architecture with a trainable gate, PoseGate-Former. The model is trained using individual images from a weakly supervised learning approach. It can reduce possibility of overfitting on some action categories due to the addition of a trainable gate to the Transformer encoder. We evaluated this model on two benchmark datasets: Human3.6M and HumanEva-I. The experimental results show that this model can obtain substantially better accuracy in all action categories of 3D human poses in the datasets compared with some fully-supervised 3D pose estimation approaches.
dc.language	en
dc.publisher	Springer International Publishing
dc.relation.ispartof	Neural Information Processing
dc.relation.isbasedon	10.1007/978-3-030-92310-5_31
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	This is a post-peer-review, pre-copyedit version of a Book Chapter published in the book. Neural Information Processing : 28th International Conference, ICONIP 2021, Sanur, Bali, Indonesia, December 8-12, 2021, Proceedings, Part VI. The final authenticated version is available online at: https://link.springer.com/chapter/10.1007/978-3-030-92310-5_31
dc.title	PoseGate-Former: Transformer Encoder with Trainable Gate for 3D Human Pose Estimation Using Weakly Supervised Learning
dc.type	Chapter
utslib.citation.volume	1517 CCIS
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2023-01-01T00:00:00+1000Z
dc.date.updated	2022-02-04T04:13:13Z
pubs.publication-status	Published
pubs.volume	1517 CCIS

Abstract:

Weakly supervised learning for 3D human pose estimation can learn a real human structure, but it generally has lower accuracy on reconstructing 3D poses. In this work, we present a 3D pose estimation model using a Transformer encoder based architecture with a trainable gate, PoseGate-Former. The model is trained using individual images from a weakly supervised learning approach. It can reduce possibility of overfitting on some action categories due to the addition of a trainable gate to the Transformer encoder. We evaluated this model on two benchmark datasets: Human3.6M and HumanEva-I. The experimental results show that this model can obtain substantially better accuracy in all action categories of 3D human poses in the datasets compared with some fully-supervised 3D pose estimation approaches.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/154191