Bilaterally Normalized Scale-Consistent Sinkhorn Distance for Few-Shot Image Classification.

Liu, Y; Zhu, L; Wang, X; Yamada, M; Yang, Y

Bilaterally Normalized Scale-Consistent Sinkhorn Distance for Few-Shot Image Classification.

Liu, Y Zhu, L Wang, X Yamada, M Yang, Y

Permalink

Publisher:: Institute of Electrical and Electronics Engineers
Publication Type:: Journal Article
Citation:: IEEE Transactions on Neural Networks and Learning Systems, 2024, PP, (8), pp. 11475-11485
Issue Date:: 2024-04-17

Closed Access

	Filename	Description	Size
	1641334.pdf	Published version	1.8 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, Y
dc.contributor.author	Zhu, L
dc.contributor.author	Wang, X
dc.contributor.author	Yamada, M
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date.accessioned	2024-08-21T05:22:42Z
dc.date.available	2024-08-21T05:22:42Z
dc.date.issued	2024-04-17
dc.identifier.citation	IEEE Transactions on Neural Networks and Learning Systems, 2024, PP, (8), pp. 11475-11485
dc.identifier.issn	1045-9227
dc.identifier.issn	1941-0093
dc.identifier.uri	http://hdl.handle.net/10453/180465
dc.description.abstract	Few-shot image classification aims at exploring transferable features from base classes to recognize images of the unseen novel classes with only a few labeled images. Existing methods usually compare the support features and query features, which are implemented by either matching the global feature vectors or matching the local feature maps at the same position. However, few labeled images fail to capture all the diverse context and intraclass variations, leading to mismatch issues for existing methods. On one hand, due to the misaligned position and cluttered background, existing methods suffer from the object mismatch issue. On the other hand, due to the scale inconsistency between images, existing methods suffer from the scale mismatch issue. In this article, we propose the bilaterally normalized scale-consistent Sinkhorn distance (BSSD) to solve these issues. First, instead of same-position matching, we use the Sinkhorn distance to find an optimal matching between images, mitigating the object mismatch caused by misaligned position. Meanwhile, we propose the intraimage and interimage attentions as the bilateral normalization on the Sinkhorn distance to suppress the object mismatch caused by background clutter. Second, local feature maps are enhanced with the multiscale pooling strategy, making the Sinkhorn distance possible to find a consistent matching scale between images. Experimental results show the effectiveness of the proposed approach, and we achieve the state-of-the-art on three few-shot benchmarks.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation	http://purl.org/au-research/grants/arc/DP200100938
dc.relation.ispartof	IEEE Transactions on Neural Networks and Learning Systems
dc.relation.isbasedon	10.1109/TNNLS.2023.3262351
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	4602 Artificial intelligence
dc.title	Bilaterally Normalized Scale-Consistent Sinkhorn Distance for Few-Shot Image Classification.
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/All Manual Groups
pubs.organisational-group	University of Technology Sydney/All Manual Groups/Australian Artificial Intelligence Institute (AAII)
pubs.organisational-group	University of Technology Sydney/All Manual Groups/Australian Artificial Intelligence Institute (AAII)/Associate Member
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2024-08-21T05:22:40Z
pubs.issue	8
pubs.publication-status	Published online
pubs.volume	PP
utslib.citation.issue	8

Abstract:

Few-shot image classification aims at exploring transferable features from base classes to recognize images of the unseen novel classes with only a few labeled images. Existing methods usually compare the support features and query features, which are implemented by either matching the global feature vectors or matching the local feature maps at the same position. However, few labeled images fail to capture all the diverse context and intraclass variations, leading to mismatch issues for existing methods. On one hand, due to the misaligned position and cluttered background, existing methods suffer from the object mismatch issue. On the other hand, due to the scale inconsistency between images, existing methods suffer from the scale mismatch issue. In this article, we propose the bilaterally normalized scale-consistent Sinkhorn distance (BSSD) to solve these issues. First, instead of same-position matching, we use the Sinkhorn distance to find an optimal matching between images, mitigating the object mismatch caused by misaligned position. Meanwhile, we propose the intraimage and interimage attentions as the bilateral normalization on the Sinkhorn distance to suppress the object mismatch caused by background clutter. Second, local feature maps are enhanced with the multiscale pooling strategy, making the Sinkhorn distance possible to find a consistent matching scale between images. Experimental results show the effectiveness of the proposed approach, and we achieve the state-of-the-art on three few-shot benchmarks.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/180465