TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

Pan, X; Yang, Z; Ma, J; Zhou, C; Yang, Y

TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering

Pan, X Yang, Z Ma, J Zhou, C Yang, Y

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE International Conference on Computer Vision, 2024, 00, pp. 3521-3532
Issue Date:: 2024-01-01

Recently Added

	Filename	Description	Size
	1704445.pdf	Published version	3.75 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is new to OPUS and is not currently available.

Full metadata record

Field	Value	Language
dc.contributor.author	Pan, X
dc.contributor.author	Yang, Z
dc.contributor.author	Ma, J
dc.contributor.author	Zhou, C
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date	2023-10-01
dc.date.accessioned	2024-08-07T04:48:17Z
dc.date.available	2024-08-07T04:48:17Z
dc.date.issued	2024-01-01
dc.identifier.citation	Proceedings of the IEEE International Conference on Computer Vision, 2024, 00, pp. 3521-3532
dc.identifier.isbn	979-8-3503-0718-4
dc.identifier.issn	1550-5499
dc.identifier.uri	http://hdl.handle.net/10453/180234
dc.description.abstract	In this paper, we focus on the task of generalizable neural human rendering which trains conditional Neural Radiance Fields (NeRF) from multi-view videos of different characters. To handle the dynamic human motion, previous methods have primarily used a SparseConvNet (SPC)-based human representation to process the painted SMPL. However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL. Tackling these issues, we present a brand-new framework named TransHuman, which learns the painted SMPL under the canonical space and captures the global relationships between human parts with transformers. Specifically, TransHuman is mainly composed of Transformer-based Human Encoding (TransHE), Deformable Partial Radiance Fields (DPaRF), and Fine-grained Detail Integration (FDI). TransHE first processes the painted SMPL under the canonical space via transformers for capturing the global relationships between human parts. Then, DPaRF binds each output token with a deformable radiance field for encoding the query point under the observation space. Finally, the FDI is employed to further integrate fine-grained information from reference images. Extensive experiments on ZJU-MoCap and H36M show that our TransHuman achieves a significantly new state-of-the-art performance with high efficiency. Project page: https://pansanity666.github.io/TransHuman/
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	Proceedings of the IEEE International Conference on Computer Vision
dc.relation.ispartof	IEEE/CVF International Conference on Computer Vision
dc.relation.isbasedon	10.1109/ICCV51070.2023.00328
dc.rights	info:eu-repo/semantics/restrictedAccess
dc.title	TransHuman: A Transformer-based Human Representation for Generalizable Neural Human Rendering
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Paris, France
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	recently_added	*
pubs.consider-herdc	false
dc.date.updated	2024-08-07T04:48:12Z
pubs.finish-date	2023-10-06
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2023-10-01
pubs.volume	00
dc.location	Piscataway, USA

Abstract:

In this paper, we focus on the task of generalizable neural human rendering which trains conditional Neural Radiance Fields (NeRF) from multi-view videos of different characters. To handle the dynamic human motion, previous methods have primarily used a SparseConvNet (SPC)-based human representation to process the painted SMPL. However, such SPC-based representation i) optimizes under the volatile observation space which leads to the pose-misalignment between training and inference stages, and ii) lacks the global relationships among human parts that is critical for handling the incomplete painted SMPL. Tackling these issues, we present a brand-new framework named TransHuman, which learns the painted SMPL under the canonical space and captures the global relationships between human parts with transformers. Specifically, TransHuman is mainly composed of Transformer-based Human Encoding (TransHE), Deformable Partial Radiance Fields (DPaRF), and Fine-grained Detail Integration (FDI). TransHE first processes the painted SMPL under the canonical space via transformers for capturing the global relationships between human parts. Then, DPaRF binds each output token with a deformable radiance field for encoding the query point under the observation space. Finally, the FDI is employed to further integrate fine-grained information from reference images. Extensive experiments on ZJU-MoCap and H36M show that our TransHuman achieves a significantly new state-of-the-art performance with high efficiency. Project page: https://pansanity666.github.io/TransHuman/

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180234