Generative models for colorization of visual data

Ghosh, Subhankar

Generative models for colorization of visual data

Ghosh, Subhankar

Permalink

Publication Type:: Thesis
Issue Date:: 2025

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download thesisAdobe PDF (4.43 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Ghosh, Subhankar
dc.date.accessioned	2025-12-19T03:28:36Z
dc.date.available	2025-12-19T03:28:36Z
dc.date.issued	2025
dc.identifier.uri	http://hdl.handle.net/10453/191017
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_US.UTF-8
dc.description.abstract	Colorization, a well-known problem in computer vision, is the process of adding color to grayscale or monochrome images, videos, or other visual data. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though researchers have made several attempts to make the colorization pipeline automatic, their methods often produce unrealistic results due to a lack of conditioning. As realistic visual data contain different colors, one of the significant challenges that visual colorization techniques encounter is handling varying objects' colors. To handle this, the colorization algorithms often require the best knowledge of semantic understanding of the scene. However, when this semantic information is introduced, the natural blending of colors becomes difficult. This thesis introduces a comprehensive framework for image colorization that addresses the challenges of consistency, realism, and scalability in real-world scenes. Here, at first, we provide a critical survey of research on image and video colorization in Chapter 2, where existing methods are comprehensively reviewed and discussed. We compared the existing strategies and extensively analyzed their advantages, disadvantages, and performances. Based on the limitations of the existing methods, we propose several novel techniques to have more robust colorization. Initially, we explored the long-range dependencies of $\lambda$ Network for image colorization (Chapter 3). Next, object information was explored as an additional input to improve image colorization with a cross-attention mechanism (Chapter 4). We also attempt to integrate textual descriptions of the grayscale image as an auxiliary condition to improve the fidelity of the colorization process. Here, to train a larger network with sufficient gradient, RRDB(Residual in Residual Dense Block) is explored (Chapter 5). Further, to reduce the inherent ambiguity of object color, we introduce a novel multi-modal strategy that incorporates object information along with their color information in the colorization process (Chapter 6). To get a more realistic output, we also devised an adversarial training using GAN model, and later we introduced a diffusion-based method for superior performance (Chapter 7). As there is no available dataset with a rich textual description of the objects with corresponding color information, we also introduced a new dataset to assist model training. Finally, we refine the diffusion-based technique without additional guidance to get a more realistic colorization model that can work in the wild without color information (Chapter 8).	en_US.UTF-8
dc.format	Thesis (PhD)
dc.language.iso	en_US	en_US.UTF-8
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/191017/1/thesis.pdf
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	© 2025 Subhanker Ghosh
dc.rights	au.edu.uts.lib/cph
dc.title	Generative models for colorization of visual data	en_US.UTF-8
dc.type	Thesis
utslib.copyright.status	open_access	*

Abstract:

Colorization, a well-known problem in computer vision, is the process of adding color to grayscale or monochrome images, videos, or other visual data. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though researchers have made several attempts to make the colorization pipeline automatic, their methods often produce unrealistic results due to a lack of conditioning. As realistic visual data contain different colors, one of the significant challenges that visual colorization techniques encounter is handling varying objects' colors. To handle this, the colorization algorithms often require the best knowledge of semantic understanding of the scene. However, when this semantic information is introduced, the natural blending of colors becomes difficult. This thesis introduces a comprehensive framework for image colorization that addresses the challenges of consistency, realism, and scalability in real-world scenes. Here, at first, we provide a critical survey of research on image and video colorization in Chapter 2, where existing methods are comprehensively reviewed and discussed. We compared the existing strategies and extensively analyzed their advantages, disadvantages, and performances. Based on the limitations of the existing methods, we propose several novel techniques to have more robust colorization. Initially, we explored the long-range dependencies of $\lambda$ Network for image colorization (Chapter 3). Next, object information was explored as an additional input to improve image colorization with a cross-attention mechanism (Chapter 4). We also attempt to integrate textual descriptions of the grayscale image as an auxiliary condition to improve the fidelity of the colorization process. Here, to train a larger network with sufficient gradient, RRDB(Residual in Residual Dense Block) is explored (Chapter 5). Further, to reduce the inherent ambiguity of object color, we introduce a novel multi-modal strategy that incorporates object information along with their color information in the colorization process (Chapter 6). To get a more realistic output, we also devised an adversarial training using GAN model, and later we introduced a diffusion-based method for superior performance (Chapter 7). As there is no available dataset with a rich textual description of the objects with corresponding color information, we also introduced a new dataset to assist model training. Finally, we refine the diffusion-based technique without additional guidance to get a more realistic colorization model that can work in the wild without color information (Chapter 8).

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/191017