Understanding deep representations learned in modeling users likes

Publication Type:
Journal Article
IEEE Transactions on Image Processing, 2016, 25 (8), pp. 3762 - 3774
Issue Date:
Filename Description Size
un.pdfPublished Version2.79 MB
Adobe PDF
Full metadata record
© 1992-2012 IEEE. Automatically understanding and discriminating different users' liking for an image is a challenging problem. This is because the relationship between image features (even semantic ones extracted by existing tools, viz., faces, objects, and so on) and users' likes is non-linear, influenced by several subtle factors. This paper presents a deep bi-modal knowledge representation of images based on their visual content and associated tags (text). A mapping step between the different levels of visual and textual representations allows for the transfer of semantic knowledge between the two modalities. Feature selection is applied before learning deep representation to identify the important features for a user to like an image. The proposed representation is shown to be effective in discriminating users based on images they like and also in recommending images that a given user likes, outperforming the state-of-the-art feature representations by ∼ 15%-20%. Beyond this test-set performance, an attempt is made to qualitatively understand the representations learned by the deep architecture used to model user likes.
Please use this identifier to cite or link to this item: