Learning Neural Network Architecture from Data: NAS and Dynamic Networks

Publication Type:
Thesis
Issue Date:
2023
Full metadata record
A myriad of breakthroughs in neural network architecture has brought significant improvement on wide range of deep learning tasks. Despite the large advances brought about by network design, manually finding well-optimized network architecture is challenging given large amount of design choices. Automatically learning neural network architecture from data, e.g. neural architecture search (NAS) and dynamic network, are novel ways of architecture design. These newly rising data-dependent methods have great potential but also have many fatal problems yet to be solved. On one hand, the efficiency and effectiveness of NAS cannot be guaranteed at the same time, because of the inaccurate architecture ratings caused by the large search space. On the other hand, for dynamic networks, dynamic sparse patterns on convolutional filters in dynamic pruning methods fail to achieve actual acceleration in real-world implementation, due to the extra burden of indexing, weight-copying, or zero-masking. Therefore, we propose two novel NAS methods and one dynamic network method to overcome these issues. Firstly, to improve NAS's effectiveness, we propose to modularize the large search space of NAS into blocks and use the blockwise representation of existing models to supervise our architecture search, distilling the neural architecture knowledge from a teacher model, forming our DNA. Secondly, to cast off the yoke of the teacher architecture, we further propose an unsupervised NAS method named Block-wisely Self-Supervised Neural Architecture Search (BossNAS). Finally, to address the aforementioned issue of dynamic network, we explore a dynamic network slimming regime, named Dynamic Slimmable Network (DS-Net), which aims to achieve good hardware-efficiency via dynamically adjusting filter numbers of networks at test time with respect to different inputs, while keeping filters stored statically and contiguously in hardware to prevent the extra burden.
Please use this identifier to cite or link to this item: