Learning Hierarchical Semantic Information for Efficient Low-Light Image Enhancement

Huang, W; Liao, X; Qian, Y; Jia, W

Learning Hierarchical Semantic Information for Efficient Low-Light Image Enhancement

Huang, W Liao, X Qian, Y Jia, W

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Conference Proceeding
Citation:: Proceedings of the International Joint Conference on Neural Networks, 2023, 2023-June, pp. 1-8
Issue Date:: 2023-01-01

Embargoed

	Filename	Description	Size
	Learning Hierarchical Semantic Information for Efficient Low-Light Image Enhancement.pdf	Accepted version	18.37 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 2 Aug 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Huang, W
dc.contributor.author	Liao, X
dc.contributor.author	Qian, Y
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338
dc.date	2023-06-18
dc.date.accessioned	2023-11-08T02:42:57Z
dc.date.available	2023-11-08T02:42:57Z
dc.date.issued	2023-01-01
dc.identifier.citation	Proceedings of the International Joint Conference on Neural Networks, 2023, 2023-June, pp. 1-8
dc.identifier.isbn	9781665488679
dc.identifier.uri	http://hdl.handle.net/10453/173224
dc.description.abstract	Low-light environments can cause a variety of complex degradation problems, which result in poor visibility in images. As a classical vision task, low-light image enhancement has attracted an increasing interest in the research community. However, the existing methods tend to require a large number of parameters, making them difficult to implement and optimize, especially on resource-constrained devices. In this paper, we mainly focus on the lightweight of the method and propose a novel end-to-end two-stage CNN-ViT architecture (HSINet) to learn hierarchical semantic information (HSI) from low-light images efficiently. The HSINet consists of two stages: the first stage is a CNN-based low-level semantic (LS) Stage, and the second stage is ViT-based high-level semantic (HS) Stage. The LS Stage contains an efficient multi-scale convolution block, MLS Block, for low-level semantic information extraction. The HS stage, on the other hand, aims to learn the high-level semantic features via ViT's excellent global-learning capability. We propose a hierarchical Swin Transformer-based block, HS Block, to gradually enlarge Swin Transformer's window size as the network becomes deeper, to learn hierarchical high-level semantic information. Benefiting from the efficient architecture, our model only contains 0.6M parameters, far fewer than the existing SOTAs. We evaluated the method on three challenging benchmark datasets: LOL, VE-LOL, and MIT-Adobe FiveK, using three popular evaluation metrics. The quantitative and qualitative results both show that the proposed method not only outperforms the state of the arts in terms of PSNR, SSIM, LPIPS, and visual effects, but also with better efficiency.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	Proceedings of the International Joint Conference on Neural Networks
dc.relation.ispartof	2023 International Joint Conference on Neural Networks (IJCNN)
dc.relation.isbasedon	10.1109/IJCNN54540.2023.10190996
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Learning Hierarchical Semantic Information for Efficient Low-Light Image Enhancement
dc.type	Conference Proceeding
utslib.citation.volume	2023-June
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-08-02T00:00:00+1000Z
dc.date.updated	2023-11-08T02:42:54Z
pubs.finish-date	2023-06-23
pubs.publication-status	Published
pubs.start-date	2023-06-18
pubs.volume	2023-June

Abstract:

Low-light environments can cause a variety of complex degradation problems, which result in poor visibility in images. As a classical vision task, low-light image enhancement has attracted an increasing interest in the research community. However, the existing methods tend to require a large number of parameters, making them difficult to implement and optimize, especially on resource-constrained devices. In this paper, we mainly focus on the lightweight of the method and propose a novel end-to-end two-stage CNN-ViT architecture (HSINet) to learn hierarchical semantic information (HSI) from low-light images efficiently. The HSINet consists of two stages: the first stage is a CNN-based low-level semantic (LS) Stage, and the second stage is ViT-based high-level semantic (HS) Stage. The LS Stage contains an efficient multi-scale convolution block, MLS Block, for low-level semantic information extraction. The HS stage, on the other hand, aims to learn the high-level semantic features via ViT's excellent global-learning capability. We propose a hierarchical Swin Transformer-based block, HS Block, to gradually enlarge Swin Transformer's window size as the network becomes deeper, to learn hierarchical high-level semantic information. Benefiting from the efficient architecture, our model only contains 0.6M parameters, far fewer than the existing SOTAs. We evaluated the method on three challenging benchmark datasets: LOL, VE-LOL, and MIT-Adobe FiveK, using three popular evaluation metrics. The quantitative and qualitative results both show that the proposed method not only outperforms the state of the arts in terms of PSNR, SSIM, LPIPS, and visual effects, but also with better efficiency.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/173224