Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Kang, J; Jia, W; He, X; Lam, KM

Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding

Kang, J Jia, W

He, X Lam, KM

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Multimedia, 2024, PP, (99), pp. 1-11
Issue Date:: 2024-01-01

Embargoed

	Filename	Description	Size
	Point Clouds Are Specialized Images A Knowledge Transfer Approach for 3D Understanding.pdf	Accepted version	1.5 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 11 Jun 2026

Full metadata record

Field	Value	Language
dc.contributor.author	Kang, J
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338
dc.contributor.author	He, X
dc.contributor.author	Lam, KM
dc.date.accessioned	2024-08-05T14:20:46Z
dc.date.available	2024-08-05T14:20:46Z
dc.date.issued	2024-01-01
dc.identifier.citation	IEEE Transactions on Multimedia, 2024, PP, (99), pp. 1-11
dc.identifier.issn	1520-9210
dc.identifier.issn	1941-0077
dc.identifier.uri	http://hdl.handle.net/10453/180016
dc.description.abstract	Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert's performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective representation capability.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Multimedia
dc.relation.isbasedon	10.1109/TMM.2024.3412330
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	40 Engineering
dc.subject.classification	46 Information and computing sciences
dc.title	Point Clouds Are Specialized Images: A Knowledge Transfer Approach for 3D Understanding
dc.type	Journal Article
utslib.citation.volume	PP
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	University of Technology Sydney/All Manual Groups
pubs.organisational-group	University of Technology Sydney/All Manual Groups/Global Big Data Technologies Research Centre (GBDTC)
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2026-06-11T00:00:00+1000Z
dc.date.updated	2024-08-05T14:20:43Z
pubs.issue	99
pubs.publication-status	Published
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Self-supervised representation learning (SSRL) has gained increasing attention in point cloud understanding, in addressing the challenges posed by 3D data scarcity and high annotation costs. This paper presents PCExpert, a novel SSRL approach that reinterprets point clouds as “specialized images”. This conceptual shift allows PCExpert to leverage knowledge derived from large-scale image modality in a more direct and deeper manner, via extensively sharing the parameters with a pretrained image encoder in a multi-way Transformer architecture. The parameter sharing strategy, combined with an additional pretext task for pre-training, i.e., transformation estimation, empowers PCExpert to outperform the state of the arts in a variety of tasks, with a remarkable reduction in the number of trainable parameters. Notably, PCExpert's performance under LINEAR fine-tuning (e.g., yielding a 90.02% overall accuracy on ScanObjectNN) has already closely approximated the results obtained with FULL model fine-tuning (92.66%), demonstrating its effective representation capability.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180016