Kill two birds with one stone: Weakly-supervised neural network for image annotation and tag refinement

Zhang, J; Wu, Q; Shen, C; Lu, J

Kill two birds with one stone: Weakly-supervised neural network for image annotation and tag refinement

Zhang, J

Wu, Q Shen, C Lu, J

Permalink

Publication Type:: Conference Proceeding
Citation:: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, pp. 7550 - 7557
Issue Date:: 2018-01-01

Closed Access

	Filename	Description	Size
	16445-16356-1-PB-IAAA-2018-Zhang.pdf	Published version	2.32 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, J https://orcid.org/0000-0002-7240-3541	en_US
dc.contributor.author	Wu, Q	en_US
dc.contributor.author	Shen, C	en_US
dc.contributor.author	Lu, J	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, pp. 7550 - 7557	en_US
dc.identifier.isbn	9781577358008	en_US
dc.identifier.uri	http://hdl.handle.net/10453/129765
dc.description.abstract	Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. The number of social images has exploded by the wide adoption of social networks, and people like to share their comments about them. These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags. However, it is well-known that user-provided tags are incomplete and imprecise to some extent. Directly using them can damage the performance of related applications, such as the image annotation and retrieval. In this paper, we propose to learn an image annotation model and refine the user-provided tags simultaneously in a weakly-supervised manner. The deep neural network is utilized as the image feature learning and backbone annotation model, while visual consistency, semantic dependency, and user-error sparsity are introduced as the constraints at the batch level to alleviate the tag noise. Therefore, our model is highly flexible and stable to handle large-scale image sets. Experimental results on two benchmark datasets indicate that our proposed model achieves the best performance compared to the state-of-the-art methods.	en_US
dc.relation.ispartof	32nd AAAI Conference on Artificial Intelligence, AAAI 2018	en_US
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Kill two birds with one stone: Weakly-supervised neural network for image annotation and tag refinement	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	closed_access	*
pubs.publication-status	Published	en_US

Abstract:

Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. The number of social images has exploded by the wide adoption of social networks, and people like to share their comments about them. These comments can be a description of the image, or some objects, attributes, scenes in it, which are normally used as the user-provided tags. However, it is well-known that user-provided tags are incomplete and imprecise to some extent. Directly using them can damage the performance of related applications, such as the image annotation and retrieval. In this paper, we propose to learn an image annotation model and refine the user-provided tags simultaneously in a weakly-supervised manner. The deep neural network is utilized as the image feature learning and backbone annotation model, while visual consistency, semantic dependency, and user-error sparsity are introduced as the constraints at the batch level to alleviate the tag noise. Therefore, our model is highly flexible and stable to handle large-scale image sets. Experimental results on two benchmark datasets indicate that our proposed model achieves the best performance compared to the state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/129765