Parameter-Efficient Vision Transformer with Linear Attention
- Publisher:
- IEEE
- Publication Type:
- Conference Proceeding
- Citation:
- 2023 IEEE International Conference on Image Processing (ICIP), 2023, 00, pp. 1275-1279
- Issue Date:
- 2023-09-11
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
Parameter-Efficient_Vision_Transformer_with_Linear_Attention.pdf | Published version | 962.42 kB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Recent advances in vision transformers ViTs have achieved outstanding performance in visual recognition tasks including image classification and detection ViTs can learn global representations with their self attention mechanism but they are usually heavy weight and unsuitable for resource constrained devices In this paper we propose a novel linear feature attention LFA module to reduce computation costs for vision transformers and combine efficient mobile CNN modules to form a parameter efficient and high performance CNN ViT hybrid model called LightFormer which can serve as a general purpose backbone to learn both global and local representation Comprehensive experiments demonstrate that LightFormer achieves competitive performance across different visual recognition tasks On the ImageNet 1K dataset LightFormer achieves top 1 accuracy of 78 5 with 5 5 million parameters Our model also performs well when transferred to object detection and semantic segmentation tasks On the MS COCO dataset LightFormer attains mAP of 33 2 within the YOLOv3 framework and on the Cityscapes dataset with only a simple all MLP decoder LightFormer achieves mIoU of 78 5 and FPS of 15 3 surpassing state of the art lightweight segmentation networks
Please use this identifier to cite or link to this item: