Block-wisely supervised neural architecture search with knowledge distillation

Li, C; Peng, J; Yuan, L; Wang, G; Liang, X; Lin, L; Chang, X

Block-wisely supervised neural architecture search with knowledge distillation

Li, C Peng, J Yuan, L Wang, G Liang, X Lin, L Chang, X

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 00, pp. 1986-1995
Issue Date:: 2020-01-01

Closed Access

	Filename	Description	Size
	Li_Block-Wisely_Supervised_Neural_Architecture_Search_With_Knowledge_Distillation_CVPR_2020_paper.pdf	Published version	736.52 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Li, C
dc.contributor.author	Peng, J
dc.contributor.author	Yuan, L
dc.contributor.author	Wang, G
dc.contributor.author	Liang, X
dc.contributor.author	Lin, L
dc.contributor.author	Chang, X https://orcid.org/0000-0002-7778-8807
dc.date	2020-06-14
dc.date.accessioned	2023-03-31T10:34:11Z
dc.date.available	2023-03-31T10:34:11Z
dc.date.issued	2020-01-01
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 00, pp. 1986-1995
dc.identifier.issn	1063-6919
dc.identifier.uri	http://hdl.handle.net/10453/168992
dc.description.abstract	Neural Architecture Search (NAS), aiming at automatically designing network architectures by machines, is expected to bring about a new revolution in machine learning. Despite these high expectation, the effectiveness and efficiency of existing NAS solutions are unclear, with some recent works going so far as to suggest that many existing NAS solutions are no better than random architecture selection. The ineffectiveness of NAS solutions may be attributed to inaccurate architecture evaluation. Specifically, to speed up NAS, recent works have proposed under-training different candidate architectures in a large search space concurrently by using shared network parameters; however, this has resulted in incorrect architecture ratings and furthered the ineffectiveness of NAS. In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates. Thanks to the blockwise search, we can also evaluate all of the candidate architectures within each block. Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture. Therefore, we propose to distill the neural architecture (DNA) knowledge from a teacher model to supervise our block-wise architecture search, which significantly improves the effectiveness of NAS. Remarkably, the performance of our searched architectures has exceeded the teacher model, demonstrating the practicability of our method. Finally, our method achieves a state-of-the-art 78.4% top-1 accuracy on ImageNet in a mobile setting. All of our searched models along with the evaluation code are available at https://github.com/changlin31/DNA.
dc.language	en
dc.publisher	IEEE
dc.relation	http://purl.org/au-research/grants/arc/DE190100626
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.ispartof	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition
dc.relation.isbasedon	10.1109/CVPR42600.2020.00206
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Block-wisely supervised neural architecture search with knowledge distillation
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	ELECTR NETWORK
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2023-03-31T10:34:09Z
pubs.finish-date	2020-06-19
pubs.publication-status	Published
pubs.start-date	2020-06-14
pubs.volume	00

Abstract:

Neural Architecture Search (NAS), aiming at automatically designing network architectures by machines, is expected to bring about a new revolution in machine learning. Despite these high expectation, the effectiveness and efficiency of existing NAS solutions are unclear, with some recent works going so far as to suggest that many existing NAS solutions are no better than random architecture selection. The ineffectiveness of NAS solutions may be attributed to inaccurate architecture evaluation. Specifically, to speed up NAS, recent works have proposed under-training different candidate architectures in a large search space concurrently by using shared network parameters; however, this has resulted in incorrect architecture ratings and furthered the ineffectiveness of NAS. In this work, we propose to modularize the large search space of NAS into blocks to ensure that the potential candidate architectures are fully trained; this reduces the representation shift caused by the shared parameters and leads to the correct rating of the candidates. Thanks to the blockwise search, we can also evaluate all of the candidate architectures within each block. Moreover, we find that the knowledge of a network model lies not only in the network parameters but also in the network architecture. Therefore, we propose to distill the neural architecture (DNA) knowledge from a teacher model to supervise our block-wise architecture search, which significantly improves the effectiveness of NAS. Remarkably, the performance of our searched architectures has exceeded the teacher model, demonstrating the practicability of our method. Finally, our method achieves a state-of-the-art 78.4% top-1 accuracy on ImageNet in a mobile setting. All of our searched models along with the evaluation code are available at https://github.com/changlin31/DNA.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/168992