TIC: text-guided image colorization using conditional generative model

Ghosh, S; Roy, P; Bhattacharya, S; Pal, U; Blumenstein, M

TIC: text-guided image colorization using conditional generative model

Ghosh, S Roy, P Bhattacharya, S Pal, U Blumenstein, M

Permalink

Publisher:: SPRINGER
Publication Type:: Journal Article
Citation:: Multimedia Tools and Applications, 2023
Issue Date:: 2023-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (3.44 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Ghosh, S
dc.contributor.author	Roy, P
dc.contributor.author	Bhattacharya, S
dc.contributor.author	Pal, U
dc.contributor.author	Blumenstein, M https://orcid.org/0000-0002-9908-3744
dc.date.accessioned	2024-02-07T22:55:00Z
dc.date.available	2024-02-07T22:55:00Z
dc.date.issued	2023-01-01
dc.identifier.citation	Multimedia Tools and Applications, 2023
dc.identifier.issn	1380-7501
dc.identifier.issn	1573-7721
dc.identifier.uri	http://hdl.handle.net/10453/175457
dc.description.abstract	Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To the best of our knowledge, this is one of the first attempts to incorporate textual conditioning in the colorization pipeline. To do so, a novel deep network has been proposed that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. The proposed model has been evaluated using different metrics like SSIM, PSNR, LPISPS and achieved scores of 0.917, 23.27,0.223, respectively. These quantitative metrics have shown that the proposed method outperforms the SOTA techniques in most of the cases.
dc.language	English
dc.publisher	SPRINGER
dc.relation.ispartof	Multimedia Tools and Applications
dc.relation.isbasedon	10.1007/s11042-023-15330-z
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0803 Computer Software, 0805 Distributed Computing, 0806 Information Systems
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	Software Engineering
dc.subject.classification	4009 Electronics, sensors and digital hardware
dc.subject.classification	4603 Computer vision and multimedia computation
dc.subject.classification	4605 Data management and data science
dc.subject.classification	4606 Distributed computing and systems software
dc.title	TIC: text-guided image colorization using conditional generative model
dc.type	Journal Article
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0803 Computer Software
utslib.for	0805 Distributed Computing
utslib.for	0806 Information Systems
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	University of Technology Sydney/Strength - QSI - Centre for Quantum Software and Information
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2024-02-07T22:54:59Z
pubs.publication-status	Published

Abstract:

Image colorization is a well-known problem in computer vision. However, due to the ill-posed nature of the task, image colorization is inherently challenging. Though several attempts have been made by researchers to make the colorization pipeline automatic, these processes often produce unrealistic results due to a lack of conditioning. In this work, we attempt to integrate textual descriptions as an auxiliary condition, along with the grayscale image that is to be colorized, to improve the fidelity of the colorization process. To the best of our knowledge, this is one of the first attempts to incorporate textual conditioning in the colorization pipeline. To do so, a novel deep network has been proposed that takes two inputs (the grayscale image and the respective encoded text description) and tries to predict the relevant color gamut. As the respective textual descriptions contain color information of the objects present in the scene, the text encoding helps to improve the overall quality of the predicted colors. The proposed model has been evaluated using different metrics like SSIM, PSNR, LPISPS and achieved scores of 0.917, 23.27,0.223, respectively. These quantitative metrics have shown that the proposed method outperforms the SOTA techniques in most of the cases.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175457