Cross-domain multi-style merge for image captioning
- Publisher:
- ACADEMIC PRESS INC ELSEVIER SCIENCE
- Publication Type:
- Journal Article
- Citation:
- Computer Vision and Image Understanding, 2023, 228
- Issue Date:
- 2023-02-01
Closed Access
Filename | Description | Size | |||
---|---|---|---|---|---|
Cross domaing multi style merge for image captioning.pdf | Accepted version | 894.24 kB |
Copyright Clearance Process
- Recently Added
- In Progress
- Closed Access
This item is closed access and not available.
Multi-style image captioning has attracted wide attention recently. Existing approaches mainly rely on style synthetics within a single domain. They cannot deal with multiple styles combination since various styles naturally cannot be included in a uniform dataset. This paper is the first one to investigate the cross-domain multi-style merge for image captioning. Specifically, we propose a novel image caption model with a multi-style gated transformer block to fit the cross-domain caption generation task. Conventional generative adversarial learning for language methods may suffer from the distribution distortion problem, since real datasets do not contain captions with style combinations. Therefore, we devise a multi-stage self-learning framework for the proposed image caption model to exploit real corpus with pseudo styles gradually. Comprehensive experiments and ablation studies demonstrate the effectiveness of our proposed method on the multi-style merge for image captioning.
Please use this identifier to cite or link to this item: