Extracting Multiple Visual Senses for Web Learning

Publication Type:
Journal Article
Citation:
IEEE Transactions on Multimedia, 2019, 21 (1), pp. 184 - 196
Issue Date:
2019-01-01
Metrics:
Full metadata record
Files in This Item:
Filename Description Size
08386672.pdfPublished Version4 MB
Adobe PDF
© 1999-2012 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on manually labeled data, there have been increasing research efforts on learning visual classifiers by directly exploiting web images. One issue that limits their performance is the problem of polysemy. Existing unsupervised approaches attempt to reduce the influence of visual polysemy by filtering out irrelevant images, but do not directly address polysemy. To this end, in this paper, we present a multimodal framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses from untagged corpora to retrieve sense-specific images. Then, we merge visual similar semantic senses and prune noise by using the retrieved images. Finally, we train one visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories and reranking search results demonstrate the superiority of our proposed approach.
Please use this identifier to cite or link to this item: