Knowledge Distillation via the Target-aware Transformer

Lin, S; Xie, H; Wang, B; Yu, K; Chang, X; Liang, X; Wang, G

Knowledge Distillation via the Target-aware Transformer

Lin, S Xie, H Wang, B Yu, K Chang, X

Liang, X Wang, G

Permalink

Publisher:: IEEE COMPUTER SOC
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 10905-10914
Issue Date:: 2022-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 27 Sep 2024

Adobe PDF

Download Accepted versionAdobe PDF (4.43 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Lin, S
dc.contributor.author	Xie, H
dc.contributor.author	Wang, B
dc.contributor.author	Yu, K
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807
dc.contributor.author	Liang, X
dc.contributor.author	Wang, G
dc.date	2022-06-18
dc.date.accessioned	2023-03-31T03:40:02Z
dc.date.available	2023-03-31T03:40:02Z
dc.date.issued	2022-01-01
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2022, 2022-June, pp. 10905-10914
dc.identifier.isbn	9781665469463
dc.identifier.issn	1063-6919
dc.identifier.uri	http://hdl.handle.net/10453/168919
dc.description.abstract	Knowledge distillation becomes a de facto standard to improve the performance of small neural networks. Most of the previous works propose to regress the representational features from the teacher to the student in a one-to-one spatial matching fashion. However, people tend to overlook the fact that, due to the architecture differences, the semantic information on the same spatial location usually vary. This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. Specifically, we allow each pixel of the teacher feature to be distilled to all spatial locations of the student features given its similarity, which is generated from a target-aware transformer. Our approach surpasses the state-of-the-art methods by a significant margin on various computer vision benchmarks, such as ImageNet, Pascal VOC and COCOStuff10k. Code is available at https://github.com/sihaoevery/TaT.
dc.language	en
dc.publisher	IEEE COMPUTER SOC
dc.relation	http://purl.org/au-research/grants/arc/DE190100626
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.ispartof	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/CVPR52688.2022.01064
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Knowledge Distillation via the Target-aware Transformer
dc.type	Conference Proceeding
utslib.citation.volume	2022-June
utslib.location.activity	New Orleans, LA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2024-09-27T00:00:00+1000Z
dc.date.updated	2023-03-31T03:39:55Z
pubs.finish-date	2022-06-24
pubs.publication-status	Published
pubs.start-date	2022-06-18
pubs.volume	2022-June

Abstract:

Knowledge distillation becomes a de facto standard to improve the performance of small neural networks. Most of the previous works propose to regress the representational features from the teacher to the student in a one-to-one spatial matching fashion. However, people tend to overlook the fact that, due to the architecture differences, the semantic information on the same spatial location usually vary. This greatly undermines the underlying assumption of the one-to-one distillation approach. To this end, we propose a novel one-to-all spatial matching knowledge distillation approach. Specifically, we allow each pixel of the teacher feature to be distilled to all spatial locations of the student features given its similarity, which is generated from a target-aware transformer. Our approach surpasses the state-of-the-art methods by a significant margin on various computer vision benchmarks, such as ImageNet, Pascal VOC and COCOStuff10k. Code is available at https://github.com/sihaoevery/TaT.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/168919