Single-View 3D Object Reconstruction from Shape Priors in Memory

Yang, S; Xu, M; Xie, H; Perry, S; Xia, J

Single-View 3D Object Reconstruction from Shape Priors in Memory

Yang, S Xu, M

Xie, H Perry, S

Xia, J

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 00, pp. 3151-3160
Issue Date:: 2021-11-13

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 30 Nov 2023

Adobe PDF

Download Accepted versionAdobe PDF (3.4 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Yang, S
dc.contributor.author	Xu, M https://orcid.org/0000-0001-9581-8849
dc.contributor.author	Xie, H
dc.contributor.author	Perry, S https://orcid.org/0000-0002-2794-3178
dc.contributor.author	Xia, J
dc.date	2021-06-20
dc.date.accessioned	2022-06-05T01:34:11Z
dc.date.available	2022-06-05T01:34:11Z
dc.date.issued	2021-11-13
dc.identifier.citation	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, 00, pp. 3151-3160
dc.identifier.isbn	978-1-6654-4509-2
dc.identifier.issn	1063-6919
dc.identifier.issn	2575-7075
dc.identifier.uri	http://hdl.handle.net/10453/157933
dc.description.abstract	Existing methods for single-view 3D object reconstruction directly learn to transform image features into 3D representations. However, these methods are vulnerable to images containing noisy backgrounds and heavy occlusions because the extracted image features do not contain enough information to reconstruct high-quality 3D shapes. Humans routinely use incomplete or noisy visual cues from an image to retrieve similar 3D shapes from their memory and reconstruct the 3D shape of an object. Inspired by this, we propose a novel method, named Mem3D, that explicitly constructs shape priors to supplement the missing information in the image. Specifically, the shape priors are in the forms of "image-voxel" pairs in the memory network, which is stored by a well-designed writing strategy during training. We also propose a voxel triplet loss function that helps to retrieve the precise 3D shapes that are highly related to the input image from shape priors. The LSTM-based shape encoder is introduced to extract information from the retrieved 3D shapes, which are useful in recovering the 3D shape of an object that is heavily occluded or in complex environments. Experimental results demonstrate that Mem3D significantly improves reconstruction quality and performs favorably against state-of-the-art methods on the ShapeNet and Pix3D datasets.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.ispartof	IEEE/CVF Conference on Computer Vision and Pattern Recognition
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/cvpr46437.2021.00317
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Single-View 3D Object Reconstruction from Shape Priors in Memory
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Nashville, TN, USA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - INEXT - Innovation in IT Services and Applications
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	open_access	*
pubs.consider-herdc	false
utslib.copyright.embargo	2023-11-30T00:00:00+1000Z
dc.date.updated	2022-06-05T01:34:06Z
pubs.finish-date	2021-06-25
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2021-06-20
pubs.volume	00
dc.location	Piscataway, USA

Abstract:

Existing methods for single-view 3D object reconstruction directly learn to transform image features into 3D representations. However, these methods are vulnerable to images containing noisy backgrounds and heavy occlusions because the extracted image features do not contain enough information to reconstruct high-quality 3D shapes. Humans routinely use incomplete or noisy visual cues from an image to retrieve similar 3D shapes from their memory and reconstruct the 3D shape of an object. Inspired by this, we propose a novel method, named Mem3D, that explicitly constructs shape priors to supplement the missing information in the image. Specifically, the shape priors are in the forms of "image-voxel" pairs in the memory network, which is stored by a well-designed writing strategy during training. We also propose a voxel triplet loss function that helps to retrieve the precise 3D shapes that are highly related to the input image from shape priors. The LSTM-based shape encoder is introduced to extract information from the retrieved 3D shapes, which are useful in recovering the 3D shape of an object that is heavily occluded or in complex environments. Experimental results demonstrate that Mem3D significantly improves reconstruction quality and performs favorably against state-of-the-art methods on the ShapeNet and Pix3D datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157933