Context-aware Image Semantic Segmentation

Huang, Ye

Context-aware Image Semantic Segmentation

Huang, Ye

Permalink

Publication Type:: Thesis
Issue Date:: 2022

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (140.55 kB)

Adobe PDF

Download thesisAdobe PDF (10.86 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Huang, Ye
dc.date.accessioned	2022-12-12T02:01:54Z
dc.date.available	2022-12-12T02:01:54Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/10453/164309
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Semantic segmentation is a fundamental task for computer vision applications. However, the existing solutions have many issues when handling difficult cases. This thesis develops three novel approaches which have improved the generalization ability of the existing solutions at significantly reduced computation costs. Extensive experiments conducted on multiple benchmark datasets have demonstrated the superior performance of the proposed approaches. 𝗦𝗰𝗮𝗹𝗲-𝗶𝗻𝘃𝗮𝗿𝗶𝗮𝗻𝘁: The state-of-the-art semantic segmentation solutions usually leverage different receptive fields via multiple parallel branches to handle objects of different sizes. However, employing separate kernels for individual branches degrades the generalization of the network to objects with different scales, and the computational cost increases with the increase of the number of branches. In this thesis, a novel network structure, namely Kernel-Sharing Atrous Convolution (KSAC), is proposed, where branches with different receptive fields share the same kernel, i.e., letting a single kernel “see” the input feature maps more than once with different receptive fields. 𝗦𝗲𝗮𝗺𝗹𝗲𝘀𝘀 𝗱𝘂𝗮𝗹 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Spatial and channel attentions, modelling the semantic inter-dependencies in spatial and channel dimensions respectively, have recently been widely used for semantic segmentation. However, computing spatial attention and channel attention separately sometimes causes errors, especially in those difficult cases. In this research, a Channelized Axial Attention (CAA) is developed to seamlessly integrate channel attention and spatial attention into a single operation with negligible computation overhead. Furthermore, a novel grouped vectorization approach is developed to allow the proposed model to run with very little memory consumption without slowing down the computation. 𝗖𝗹𝗮𝘀𝘀-𝗮𝘄𝗮𝗿𝗲 𝗿𝗲𝗴𝘂𝗹𝗮𝗿𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Recent segmentation methods utilizing class-level information in addition to pixel features have achieved notable success in boosting the accuracy of existing network models. However, the extracted class-level information was simply concatenated to pixel features, without being explicitly exploited to learn better pixel representation. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. Motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with and aiming to use class-level information more effectively, a universal Class-Aware Regularization (CAR) approach is proposed to optimize the intra-class variance and inter-class distance during feature learning. Furthermore, the class center in the proposed approach is directly generated from ground truth instead of from the error-prone coarse prediction. The proposed CAR can be easily applied to most existing segmentation models and can largely improve their accuracy at no additional inference overhead.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/164309/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	au.edu.uts.lib/ppc
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Context-aware Image Semantic Segmentation	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Semantic segmentation is a fundamental task for computer vision applications. However, the existing solutions have many issues when handling difficult cases. This thesis develops three novel approaches which have improved the generalization ability of the existing solutions at significantly reduced computation costs. Extensive experiments conducted on multiple benchmark datasets have demonstrated the superior performance of the proposed approaches. 𝗦𝗰𝗮𝗹𝗲-𝗶𝗻𝘃𝗮𝗿𝗶𝗮𝗻𝘁: The state-of-the-art semantic segmentation solutions usually leverage different receptive fields via multiple parallel branches to handle objects of different sizes. However, employing separate kernels for individual branches degrades the generalization of the network to objects with different scales, and the computational cost increases with the increase of the number of branches. In this thesis, a novel network structure, namely Kernel-Sharing Atrous Convolution (KSAC), is proposed, where branches with different receptive fields share the same kernel, i.e., letting a single kernel “see” the input feature maps more than once with different receptive fields. 𝗦𝗲𝗮𝗺𝗹𝗲𝘀𝘀 𝗱𝘂𝗮𝗹 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻: Spatial and channel attentions, modelling the semantic inter-dependencies in spatial and channel dimensions respectively, have recently been widely used for semantic segmentation. However, computing spatial attention and channel attention separately sometimes causes errors, especially in those difficult cases. In this research, a Channelized Axial Attention (CAA) is developed to seamlessly integrate channel attention and spatial attention into a single operation with negligible computation overhead. Furthermore, a novel grouped vectorization approach is developed to allow the proposed model to run with very little memory consumption without slowing down the computation. 𝗖𝗹𝗮𝘀𝘀-𝗮𝘄𝗮𝗿𝗲 𝗿𝗲𝗴𝘂𝗹𝗮𝗿𝗶𝘇𝗮𝘁𝗶𝗼𝗻: Recent segmentation methods utilizing class-level information in addition to pixel features have achieved notable success in boosting the accuracy of existing network models. However, the extracted class-level information was simply concatenated to pixel features, without being explicitly exploited to learn better pixel representation. Moreover, these approaches learn soft class centers based on coarse mask prediction, which is prone to error accumulation. Motivated by the fact that humans can recognize an object by itself no matter which other objects it appears with and aiming to use class-level information more effectively, a universal Class-Aware Regularization (CAR) approach is proposed to optimize the intra-class variance and inter-class distance during feature learning. Furthermore, the class center in the proposed approach is directly generated from ground truth instead of from the error-prone coarse prediction. The proposed CAR can be easily applied to most existing segmentation models and can largely improve their accuracy at no additional inference overhead.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/164309