Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition.

Yu, J; Zhu, C; Zhang, J; Huang, Q; Tao, D

Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition.

Yu, J Zhu, C Zhang, J Huang, Q Tao, D

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE transactions on neural networks and learning systems, 2020, 31, (2), pp. 661-674
Issue Date:: 2020-02

Closed Access

	Filename	Description	Size
	08700608.pdf	Published version	2.92 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yu, J
dc.contributor.author	Zhu, C
dc.contributor.author	Zhang, J
dc.contributor.author	Huang, Q
dc.contributor.author	Tao, D https://orcid.org/0000-0001-7225-5449
dc.date.accessioned	2021-04-09T07:09:43Z
dc.date.available	2021-04-09T07:09:43Z
dc.date.issued	2020-02
dc.identifier.citation	IEEE transactions on neural networks and learning systems, 2020, 31, (2), pp. 661-674
dc.identifier.issn	2162-237X
dc.identifier.issn	2162-2388
dc.identifier.uri	http://hdl.handle.net/10453/147965
dc.description.abstract	We propose an end-to-end place recognition model based on a novel deep neural network. First, we propose to exploit the spatial pyramid structure of the images to enhance the vector of locally aggregated descriptors (VLAD) such that the enhanced VLAD features can reflect the structural information of the images. To encode this feature extraction into the deep learning method, we build a spatial pyramid-enhanced VLAD (SPE-VLAD) layer. Next, we impose weight constraints on the terms of the traditional triplet loss (T-loss) function such that the weighted T-loss (WT-loss) function avoids the suboptimal convergence of the learning process. The loss function can work well under weakly supervised scenarios in that it determines the semantically positive and negative samples of each query through not only the GPS tags but also the Euclidean distance between the image representations. The SPE-VLAD layer and the WT-loss layer are integrated with the VGG-16 network or ResNet-18 network to form a novel end-to-end deep neural network that can be easily trained via the standard backpropagation method. We conduct experiments on three benchmark data sets, and the results demonstrate that the proposed model defeats the state-of-the-art deep learning approaches applied to place recognition.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	http://purl.org/au-research/grants/arc/DP180103424
dc.relation.ispartof	IEEE transactions on neural networks and learning systems
dc.relation.isbasedon	10.1109/tnnls.2019.2908982
dc.rights	© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Spatial Pyramid-Enhanced NetVLAD With Weighted Triplet Loss for Place Recognition.
dc.type	Journal Article
utslib.citation.volume	31
utslib.location.activity	United States
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
dc.date.updated	2021-04-09T07:09:42Z
pubs.issue	2
pubs.publication-status	Published
pubs.volume	31
utslib.citation.issue	2

Abstract:

We propose an end-to-end place recognition model based on a novel deep neural network. First, we propose to exploit the spatial pyramid structure of the images to enhance the vector of locally aggregated descriptors (VLAD) such that the enhanced VLAD features can reflect the structural information of the images. To encode this feature extraction into the deep learning method, we build a spatial pyramid-enhanced VLAD (SPE-VLAD) layer. Next, we impose weight constraints on the terms of the traditional triplet loss (T-loss) function such that the weighted T-loss (WT-loss) function avoids the suboptimal convergence of the learning process. The loss function can work well under weakly supervised scenarios in that it determines the semantically positive and negative samples of each query through not only the GPS tags but also the Euclidean distance between the image representations. The SPE-VLAD layer and the WT-loss layer are integrated with the VGG-16 network or ResNet-18 network to form a novel end-to-end deep neural network that can be easily trained via the standard backpropagation method. We conduct experiments on three benchmark data sets, and the results demonstrate that the proposed model defeats the state-of-the-art deep learning approaches applied to place recognition.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/147965