Generative models for colorization of visual data
- Publication Type:
- Thesis
- Issue Date:
- 2025
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Colorization, a well-known problem in computer vision, is the process of adding color to grayscale or monochrome images, videos, or other visual data. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though researchers have made several attempts to make the colorization pipeline automatic, their methods often produce unrealistic results due to a lack of conditioning. As realistic visual data contain different colors, one of the significant challenges that visual colorization techniques encounter is handling varying objects' colors. To handle this, the colorization algorithms often require the best knowledge of semantic understanding of the scene. However, when this semantic information is introduced, the natural blending of colors becomes difficult. This thesis introduces a comprehensive framework for image colorization that addresses the challenges of consistency, realism, and scalability in real-world scenes. Here, at first, we provide a critical survey of research on image and video colorization in Chapter 2, where existing methods are comprehensively reviewed and discussed. We compared the existing strategies and extensively analyzed their advantages, disadvantages, and performances. Based on the limitations of the existing methods, we propose several novel techniques to have more robust colorization. Initially, we explored the long-range dependencies of $\lambda$ Network for image colorization (Chapter 3). Next, object information was explored as an additional input to improve image colorization with a cross-attention mechanism (Chapter 4). We also attempt to integrate textual descriptions of the grayscale image as an auxiliary condition to improve the fidelity of the colorization process. Here, to train a larger network with sufficient gradient, RRDB(Residual in Residual Dense Block) is explored (Chapter 5). Further, to reduce the inherent ambiguity of object color, we introduce a novel multi-modal strategy that incorporates object information along with their color information in the colorization process (Chapter 6). To get a more realistic output, we also devised an adversarial training using GAN model, and later we introduced a diffusion-based method for superior performance (Chapter 7). As there is no available dataset with a rich textual description of the objects with corresponding color information, we also introduced a new dataset to assist model training. Finally, we refine the diffusion-based technique without additional guidance to get a more realistic colorization model that can work in the wild without color information (Chapter 8).
Please use this identifier to cite or link to this item:
