Understanding deep representations learned in modeling users likes

Guntuku, SC; Zhou, JT; Roy, S; Lin, W; Tsang, IW

Understanding deep representations learned in modeling users likes

Guntuku, SC Zhou, JT Roy, S Lin, W Tsang, IW

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Image Processing, 2016, 25 (8), pp. 3762 - 3774
Issue Date:: 2016-08-01

Closed Access

	Filename	Description	Size
	un.pdf	Published Version	2.79 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Guntuku, SC	en_US
dc.contributor.author	Zhou, JT	en_US
dc.contributor.author	Roy, S	en_US
dc.contributor.author	Lin, W	en_US
dc.contributor.author	Tsang, IW https://orcid.org/0000-0001-8095-4637	en_US
dc.date.issued	2016-08-01	en_US
dc.identifier.citation	IEEE Transactions on Image Processing, 2016, 25 (8), pp. 3762 - 3774	en_US
dc.identifier.issn	1057-7149	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121801
dc.description.abstract	© 1992-2012 IEEE. Automatically understanding and discriminating different users' liking for an image is a challenging problem. This is because the relationship between image features (even semantic ones extracted by existing tools, viz., faces, objects, and so on) and users' likes is non-linear, influenced by several subtle factors. This paper presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. Feature selection is applied before learning deep representation to identify the important features for a user to like an image. The proposed representation is shown to be effective in discriminating users based on images they like and also in recommending images that a given user likes, outperforming the state-of-the-art feature representations by ∼ 15%-20%. Beyond this test-set performance, an attempt is made to qualitatively understand the representations learned by the deep architecture used to model user likes.	en_US
dc.relation	http://purl.org/au-research/grants/arc/FT130100746
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation.ispartof	IEEE Transactions on Image Processing	en_US
dc.relation.isbasedon	10.1109/TIP.2016.2576278	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.subject.mesh	Face	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Image Interpretation, Computer-Assisted	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Semantics	en_US
dc.subject.mesh	Artificial Intelligence	en_US
dc.subject.mesh	Image Processing, Computer-Assisted	en_US
dc.subject.mesh	Information Storage and Retrieval	en_US
dc.subject.mesh	Pattern Recognition, Automated	en_US
dc.title	Understanding deep representations learned in modeling users likes	en_US
dc.type	Journal Article
utslib.citation.volume	8	en_US
utslib.citation.volume	25	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
utslib.for	1702 Cognitive Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	8	en_US
pubs.publication-status	Published	en_US
pubs.volume	25	en_US

Abstract:

© 1992-2012 IEEE. Automatically understanding and discriminating different users' liking for an image is a challenging problem. This is because the relationship between image features (even semantic ones extracted by existing tools, viz., faces, objects, and so on) and users' likes is non-linear, influenced by several subtle factors. This paper presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. Feature selection is applied before learning deep representation to identify the important features for a user to like an image. The proposed representation is shown to be effective in discriminating users based on images they like and also in recommending images that a given user likes, outperforming the state-of-the-art feature representations by ∼ 15%-20%. Beyond this test-set performance, an attempt is made to qualitatively understand the representations learned by the deep architecture used to model user likes.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121801