Cross-domain multi-style merge for image captioning

Publisher:
ACADEMIC PRESS INC ELSEVIER SCIENCE
Publication Type:
Journal Article
Citation:
Computer Vision and Image Understanding, 2023, 228
Issue Date:
2023-02-01
Filename Description Size
Cross domaing multi style merge for image captioning.pdfAccepted version894.24 kB
Adobe PDF
Full metadata record
Multi-style image captioning has attracted wide attention recently. Existing approaches mainly rely on style synthetics within a single domain. They cannot deal with multiple styles combination since various styles naturally cannot be included in a uniform dataset. This paper is the first one to investigate the cross-domain multi-style merge for image captioning. Specifically, we propose a novel image caption model with a multi-style gated transformer block to fit the cross-domain caption generation task. Conventional generative adversarial learning for language methods may suffer from the distribution distortion problem, since real datasets do not contain captions with style combinations. Therefore, we devise a multi-stage self-learning framework for the proposed image caption model to exploit real corpus with pseudo styles gradually. Comprehensive experiments and ablation studies demonstrate the effectiveness of our proposed method on the multi-style merge for image captioning.
Please use this identifier to cite or link to this item: