Overcoming multi-model forgetting in one-shot NAS with diversity maximization

Zhang, M; Li, H; Pan, S; Chang, X; Su, S

Overcoming multi-model forgetting in one-shot NAS with diversity maximization

Zhang, M Li, H Pan, S

Chang, X Su, S

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 00, pp. 7806-7815
Issue Date:: 2020-01-01

Closed Access

	Filename	Description	Size
	09156354.pdf	Published version	1.75 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, M
dc.contributor.author	Li, H
dc.contributor.author	Pan, S https://orcid.org/0000-0003-0794-527X
dc.contributor.author	Chang, X
dc.contributor.author	Su, S
dc.date	2020-06-13
dc.date.accessioned	2021-04-16T01:06:26Z
dc.date.available	2021-04-16T01:06:26Z
dc.date.issued	2020-01-01
dc.identifier.citation	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, 00, pp. 7806-7815
dc.identifier.issn	1063-6919
dc.identifier.uri	http://hdl.handle.net/10453/148153
dc.description.abstract	One-Shot Neural Architecture Search (NAS) significantly improves the computational efficiency through weight sharing. However, this approach also introduces multi-model forgetting during the supernet training (architecture search phase), where the performance of previous architectures degrades when sequentially training new architectures with partially-shared weights. To overcome such catastrophic forgetting, the state-of-the-art method assumes that the shared weights are optimal when jointly optimizing a posterior probability. However, this strict assumption is not necessarily held for One-Shot NAS in practice. In this paper, we formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures. We propose a Novelty Search based Architecture Selection (NSAS) loss function and demonstrate that the posterior probability could be calculated without the strict assumption when maximizing the diversity of the selected constraints. A greedy novelty search method is devised to find the most representative subset to regularize the supernet training. We apply our proposed approach to two One-Shot NAS baselines, random sampling NAS (RandomNAS) and gradient-based sampling NAS (GDAS). Extensive experiments demonstrate that our method enhances the predictive ability of the supernet in One-Shot NAS and achieves remarkable performance on CIFAR-10, CIFAR-100, and PTB with efficiency.
dc.language	en
dc.publisher	IEEE
dc.relation	http://purl.org/au-research/grants/arc/DE190100626
dc.relation.ispartof	Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
dc.relation.ispartof	2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
dc.relation.isbasedon	10.1109/CVPR42600.2020.00783
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Overcoming multi-model forgetting in one-shot NAS with diversity maximization
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.for	0801 Artificial Intelligence and Image Processing
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
dc.date.updated	2021-04-16T01:06:24Z
pubs.finish-date	2020-06-19
pubs.publication-status	Published
pubs.start-date	2020-06-13
pubs.volume	00

Abstract:

One-Shot Neural Architecture Search (NAS) significantly improves the computational efficiency through weight sharing. However, this approach also introduces multi-model forgetting during the supernet training (architecture search phase), where the performance of previous architectures degrades when sequentially training new architectures with partially-shared weights. To overcome such catastrophic forgetting, the state-of-the-art method assumes that the shared weights are optimal when jointly optimizing a posterior probability. However, this strict assumption is not necessarily held for One-Shot NAS in practice. In this paper, we formulate the supernet training in the One-Shot NAS as a constrained optimization problem of continual learning that the learning of current architecture should not degrade the performance of previous architectures. We propose a Novelty Search based Architecture Selection (NSAS) loss function and demonstrate that the posterior probability could be calculated without the strict assumption when maximizing the diversity of the selected constraints. A greedy novelty search method is devised to find the most representative subset to regularize the supernet training. We apply our proposed approach to two One-Shot NAS baselines, random sampling NAS (RandomNAS) and gradient-based sampling NAS (GDAS). Extensive experiments demonstrate that our method enhances the predictive ability of the supernet in One-Shot NAS and achieves remarkable performance on CIFAR-10, CIFAR-100, and PTB with efficiency.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/148153