Towards Image Semantic Segmentation: From Context to Language

Tang, Huadong

Towards Image Semantic Segmentation: From Context to Language

Tang, Huadong

Permalink

Publication Type:: Thesis
Issue Date:: 2025

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (16.52 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Tang, Huadong
dc.date.accessioned	2026-05-27T01:32:58Z
dc.date.available	2026-05-27T01:32:58Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10453/195155
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Semantic image segmentation is crucial for contemporary computer vision applications. It aims to classify each pixel into a specific category or class. Despite significant advancements in semantic segmentation, current methods still face challenges, including low efficiency and limited capture of contextual dependencies due to structural limitations. This research primarily focuses on improving the semantic segmentation algorithms from three different aspects. Class-Aware Contextual Information: Leveraging contextual dependencies is a commonly used technique to enhance the performance of image segmentation. However, existing solutions do not effectively catch the class-level association between the pixels along the boundary across the objects of the different classes, but focus more on the local pixel-to-pixel relation. In this thesis, a Class-Aware Affinity module (CAA) is proposed that considers both pixel-to-pixel relation and pixel-to-class association. Extended Context-Aware Classifier: The vanilla classifier captures global information from the training data, encoded through a fixed set of parameters, including weights and biases. However, each image has a different class distribution, which prevents the classifier from addressing the unique characteristics of individual images. At the dataset level, class imbalance leads to segmentation results being biased towards majority classes, limiting the model's effectiveness in identifying and segmenting minority class regions. In this research, we propose an Extended Context-Aware Classifier (ECAC) that dynamically adjusts the classifier using global (dataset-level) and local (image-level) contextual information. Open-Vocabulary Semantic Segmentation: Open-vocabulary semantic segmentation relies on precise pixel-level alignment of visual and textual representations, using text as a universal reference to bridge visual disparities across diverse datasets. While prior work has primarily focused on improving visual representations or alignment models, the pivotal role of textual representations has often been neglected. This research proposes a novel approach that harnesses large language models (LLMs) to produce enriched text prompts, replacing rudimentary templates with semantically detailed descriptions.	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/195155/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2025 Huadong Tang
dc.rights	au.edu.uts.lib/cph
dc.title	Towards Image Semantic Segmentation: From Context to Language	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Semantic image segmentation is crucial for contemporary computer vision applications. It aims to classify each pixel into a specific category or class. Despite significant advancements in semantic segmentation, current methods still face challenges, including low efficiency and limited capture of contextual dependencies due to structural limitations. This research primarily focuses on improving the semantic segmentation algorithms from three different aspects. Class-Aware Contextual Information: Leveraging contextual dependencies is a commonly used technique to enhance the performance of image segmentation. However, existing solutions do not effectively catch the class-level association between the pixels along the boundary across the objects of the different classes, but focus more on the local pixel-to-pixel relation. In this thesis, a Class-Aware Affinity module (CAA) is proposed that considers both pixel-to-pixel relation and pixel-to-class association. Extended Context-Aware Classifier: The vanilla classifier captures global information from the training data, encoded through a fixed set of parameters, including weights and biases. However, each image has a different class distribution, which prevents the classifier from addressing the unique characteristics of individual images. At the dataset level, class imbalance leads to segmentation results being biased towards majority classes, limiting the model's effectiveness in identifying and segmenting minority class regions. In this research, we propose an Extended Context-Aware Classifier (ECAC) that dynamically adjusts the classifier using global (dataset-level) and local (image-level) contextual information. Open-Vocabulary Semantic Segmentation: Open-vocabulary semantic segmentation relies on precise pixel-level alignment of visual and textual representations, using text as a universal reference to bridge visual disparities across diverse datasets. While prior work has primarily focused on improving visual representations or alignment models, the pivotal role of textual representations has often been neglected. This research proposes a novel approach that harnesses large language models (LLMs) to produce enriched text prompts, replacing rudimentary templates with semantically detailed descriptions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/195155