Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics

Lin, A; Li, J; Xiang, Y; Bian, W; Prasad, M

Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics

Lin, A Li, J

Xiang, Y Bian, W Prasad, M

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Intelligent Vehicles, 2024, PP, (99), pp. 1-11
Issue Date:: 2024-01-01

Closed Access

	Filename	Description	Size
	1705998.pdf	Published version	16.76 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Lin, A
dc.contributor.author	Li, J https://orcid.org/0000-0002-1336-2241
dc.contributor.author	Xiang, Y
dc.contributor.author	Bian, W
dc.contributor.author	Prasad, M
dc.date.accessioned	2024-08-21T05:30:56Z
dc.date.available	2024-08-21T05:30:56Z
dc.date.issued	2024-01-01
dc.identifier.citation	IEEE Transactions on Intelligent Vehicles, 2024, PP, (99), pp. 1-11
dc.identifier.issn	2379-8858
dc.identifier.issn	2379-8858
dc.identifier.uri	http://hdl.handle.net/10453/180482
dc.description.abstract	High-quality surface normal can help improve geometry estimation in problems faced by autonomous vehicles, such as collision avoidance and occlusion inference. While a considerable volume of literature focuses on densely scanned indoor scenarios, normal estimation during autonomous driving remains an intricate problem due to the sparse, non-uniform, and noisy nature of real-world LiDAR scans. In this paper, we introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation. We present the Hybrid Geometric Transformer (HGT), a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information. Furthermore, we developed an effective learning strategy for the multi-modal data. Experimental results demonstrate the superior effectiveness of our information fusion approach compared to existing methods. It has also been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene. The learned geometric knowledge is transferable and can be applied to real-world 3D scenes in the KITTI dataset. Further tasks built upon the estimated normal vectors in the KITTI dataset show that the proposed estimator has an advantage over existing methods.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Intelligent Vehicles
dc.relation.isbasedon	10.1109/TIV.2024.3363174
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	4002 Automotive engineering
dc.subject.classification	4007 Control engineering, mechatronics and robotics
dc.subject.classification	4603 Computer vision and multimedia computation
dc.title	Normal Transformer: Extracting Surface Geometry from LiDAR Points Enhanced by Visual Semantics
dc.type	Journal Article
utslib.citation.volume	PP
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	University of Technology Sydney/All Manual Groups
pubs.organisational-group	University of Technology Sydney/All Manual Groups/Australian Artificial Intelligence Institute (AAII)
pubs.organisational-group	University of Technology Sydney/All Manual Groups/Centre for Built Infrastructure (CBI)
utslib.copyright.status	closed_access	*
dc.date.updated	2024-08-21T05:30:38Z
pubs.issue	99
pubs.publication-status	Published
pubs.volume	PP
utslib.citation.issue	99

Abstract:

High-quality surface normal can help improve geometry estimation in problems faced by autonomous vehicles, such as collision avoidance and occlusion inference. While a considerable volume of literature focuses on densely scanned indoor scenarios, normal estimation during autonomous driving remains an intricate problem due to the sparse, non-uniform, and noisy nature of real-world LiDAR scans. In this paper, we introduce a multi-modal technique that leverages 3D point clouds and 2D colour images obtained from LiDAR and camera sensors for surface normal estimation. We present the Hybrid Geometric Transformer (HGT), a novel transformer-based neural network architecture that proficiently fuses visual semantic and 3D geometric information. Furthermore, we developed an effective learning strategy for the multi-modal data. Experimental results demonstrate the superior effectiveness of our information fusion approach compared to existing methods. It has also been verified that the proposed model can learn from a simulated 3D environment that mimics a traffic scene. The learned geometric knowledge is transferable and can be applied to real-world 3D scenes in the KITTI dataset. Further tasks built upon the estimated normal vectors in the KITTI dataset show that the proposed estimator has an advantage over existing methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180482