Extracting Multiple Visual Senses for Web Learning

Yao, Y; Shen, F; Zhang, J; Liu, L; Tang, Z; Shao, L

Extracting Multiple Visual Senses for Web Learning

Yao, Y

Shen, F Zhang, J

Liu, L Tang, Z Shao, L

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Multimedia, 2019, 21 (1), pp. 184 - 196
Issue Date:: 2019-01-01

Closed Access

	Filename	Description	Size
	08386672.pdf	Published Version	4 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Yao, Y https://orcid.org/0000-0002-0337-9410	en_US
dc.contributor.author	Shen, F	en_US
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541	en_US
dc.contributor.author	Liu, L	en_US
dc.contributor.author	Tang, Z	en_US
dc.contributor.author	Shao, L	en_US
dc.date.issued	2019-01-01	en_US
dc.identifier.citation	IEEE Transactions on Multimedia, 2019, 21 (1), pp. 184 - 196	en_US
dc.identifier.issn	1520-9210	en_US
dc.identifier.uri	http://hdl.handle.net/10453/125819
dc.description.abstract	© 1999-2012 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on manually labeled data, there have been increasing research efforts on learning visual classifiers by directly exploiting web images. One issue that limits their performance is the problem of polysemy. Existing unsupervised approaches attempt to reduce the influence of visual polysemy by filtering out irrelevant images, but do not directly address polysemy. To this end, in this paper, we present a multimodal framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses from untagged corpora to retrieve sense-specific images. Then, we merge visual similar semantic senses and prune noise by using the retrieved images. Finally, we train one visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories and reranking search results demonstrate the superiority of our proposed approach.	en_US
dc.relation.ispartof	IEEE Transactions on Multimedia	en_US
dc.relation.isbasedon	10.1109/TMM.2018.2847248	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.title	Extracting Multiple Visual Senses for Web Learning	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	21	en_US
utslib.for	0803 Computer Software	en_US
utslib.for	0905 Civil Engineering	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access	*
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	21	en_US

Abstract:

© 1999-2012 IEEE. Labeled image datasets have played a critical role in high-level image understanding. However, the process of manual labeling is both time consuming and labor intensive. To reduce the dependence on manually labeled data, there have been increasing research efforts on learning visual classifiers by directly exploiting web images. One issue that limits their performance is the problem of polysemy. Existing unsupervised approaches attempt to reduce the influence of visual polysemy by filtering out irrelevant images, but do not directly address polysemy. To this end, in this paper, we present a multimodal framework that solves the problem of polysemy by allowing sense-specific diversity in search results. Specifically, we first discover a list of possible semantic senses from untagged corpora to retrieve sense-specific images. Then, we merge visual similar semantic senses and prune noise by using the retrieved images. Finally, we train one visual classifier for each selected semantic sense and use the learned sense-specific classifiers to distinguish multiple visual senses. Extensive experiments on classifying images into sense-specific categories and reranking search results demonstrate the superiority of our proposed approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/125819