Exploiting textual queries for dynamically visual disambiguation

Sun, Z; Yao, Y; Xiao, J; Zhang, L; Zhang, J; Tang, Z

Exploiting textual queries for dynamically visual disambiguation

Sun, Z Yao, Y Xiao, J Zhang, L Zhang, J

Tang, Z

Permalink

Publisher:: Elsevier BV
Publication Type:: Journal Article
Citation:: Pattern Recognition, 2020, pp. 107620-107620
Issue Date:: 2020-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 20 Jan 2022

Adobe PDF

Download Published versionAdobe PDF (2.92 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Sun, Z
dc.contributor.author	Yao, Y
dc.contributor.author	Xiao, J
dc.contributor.author	Zhang, L
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541
dc.contributor.author	Tang, Z
dc.date.accessioned	2020-10-06T02:14:15Z
dc.date.available	2020-10-06T02:14:15Z
dc.date.issued	2020-01-01
dc.identifier.citation	Pattern Recognition, 2020, pp. 107620-107620
dc.identifier.issn	0031-3203
dc.identifier.uri	http://hdl.handle.net/10453/143101
dc.description.abstract	© 2020 Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.
dc.language	en
dc.publisher	Elsevier BV
dc.relation.ispartof	Pattern Recognition
dc.relation.isbasedon	10.1016/j.patcog.2020.107620
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Exploiting textual queries for dynamically visual disambiguation
dc.type	Journal Article
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0806 Information Systems
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2022-01-20T00:00:00+1000Z
dc.date.updated	2020-10-06T02:14:10Z
pubs.publication-status	Published

Abstract:

© 2020 Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits the performance of current webly supervised models is the problem of visual polysemy. In this work, we present a novel framework that resolves visual polysemy by dynamically matching candidate text queries with retrieved images. Specifically, our proposed framework includes three major steps: we first discover and then dynamically select the text queries according to the keyword-based image search results, we employ the proposed saliency-guided deep multi-instance learning (MIL) network to remove outliers and learn classification models for visual disambiguation. Compared to existing methods, our proposed approach can figure out the right visual senses, adapt to dynamic changes in the search results, remove outliers, and jointly learn the classification models. Extensive experiments and ablation studies on CMU-Poly-30 and MIT-ISD datasets demonstrate the effectiveness of our proposed approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/143101