Style-Aware Contrastive Learning for Multi-Style Image Captioning

Zhou, Y; Long, G

Style-Aware Contrastive Learning for Multi-Style Image Captioning

Zhou, Y Long, G

Permalink

Publication Type:: Conference Proceeding
Citation:: EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023, 2023, pp. 2257-2267
Issue Date:: 2023-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted versionAdobe PDF (1.5 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhou, Y
dc.contributor.author	Long, G https://orcid.org/0000-0003-3740-9515
dc.date.accessioned	2024-03-11T03:55:45Z
dc.date.available	2024-03-11T03:55:45Z
dc.date.issued	2023-01-01
dc.identifier.citation	EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023, 2023, pp. 2257-2267
dc.identifier.isbn	9781959429470
dc.identifier.uri	http://hdl.handle.net/10453/176459
dc.description.abstract	Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.
dc.language	en
dc.relation.ispartof	EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2023
dc.relation.isbasedon	10.48550/arXiv.2301.11367
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Style-Aware Contrastive Learning for Multi-Style Image Captioning
dc.type	Conference Proceeding
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	open_access	*
dc.date.updated	2024-03-11T03:55:44Z
pubs.publication-status	Published

Abstract:

Existing multi-style image captioning methods show promising results in generating a caption with accurate visual content and desired linguistic style. However, existing methods overlook the relationship between linguistic style and visual content. To overcome this drawback, we propose style-aware contrastive learning for multi-style image captioning. First, we present a style-aware visual encoder with contrastive learning to mine potential visual content relevant to style. Moreover, we propose a style-aware triplet contrast objective to distinguish whether the image, style and caption matched. To provide positive and negative samples for contrastive learning, we present three retrieval schemes: object-based retrieval, RoI-based retrieval and triplet-based retrieval, and design a dynamic trade-off function to calculate retrieval scores. Experimental results demonstrate that our approach achieves state-of-the-art performance. In addition, we conduct an extensive analysis to verify the effectiveness of our method.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/176459