Bridging the Web Data and Fine-Grained Visual Recognition via Alleviating Label Noise and Domain Mismatch

Publication Type:
Conference Proceeding
Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 1735-1744
Issue Date:
Filename Description Size
3394171.3413851.pdfPublished Version3.42 MB
Adobe PDF
Full metadata record
To distinguish the subtle differences among fine-grained categories, a large amount of well-labeled images are typically required. However, manual annotations for fine-grained categories is an extremely difficult task as it usually has a high demand for professional knowledge. To this end, we propose to directly leverage web images for fine-grained visual recognition. Our work mainly focuses on two critical issues including "label noise" and "domain mismatch" in the web images. Specifically, we propose an end-to-end deep denoising network (DDN) model to jointly solve these problems in the process of web images selection. To verify the effectiveness of our proposed approach, we first collect web images by using the labels in fine-grained datasets. Then we apply the proposed deep denoising network model for noise removal and domain mismatch alleviation. We leverage the selected web images as the training set for fine-grained categorization models learning. Extensive experiments and ablation studies demonstrate state-of-the-art performance gained by our proposed approach, which, at the same time, delivers a new pipeline for fine-grained visual categorization that is to be highly effective for real-world applications.
Please use this identifier to cite or link to this item: