Weakly supervised human fixations prediction

Zhang, L; Li, X; Nie, L; Yang, Y; Xia, Y

Weakly supervised human fixations prediction

Zhang, L Li, X Nie, L Yang, Y

Xia, Y

Permalink

Publication Type:: Journal Article
Citation:: IEEE Transactions on Cybernetics, 2016, 46 (1), pp. 258 - 269
Issue Date:: 2016-01-01

Closed Access

	Filename	Description	Size
	07152897.pdf	Published Version	1.96 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, L	en_US
dc.contributor.author	Li, X	en_US
dc.contributor.author	Nie, L	en_US
dc.contributor.author	Yang, Y https://orcid.org/0000-0001-5528-0546	en_US
dc.contributor.author	Xia, Y	en_US
dc.date.issued	2016-01-01	en_US
dc.identifier.citation	IEEE Transactions on Cybernetics, 2016, 46 (1), pp. 258 - 269	en_US
dc.identifier.issn	2168-2267	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121581
dc.description.abstract	© 2015 IEEE. Automatically predicting human eye fixations is a useful technique that can facilitate many multimedia applications, e.g., image retrieval, action recognition, and photo retargeting. Conventional approaches are frustrated by two drawbacks. First, psychophysical experiments show that an object-level interpretation of scenes influences eye movements significantly. Most of the existing saliency models rely on object detectors, and therefore, only a few prespecified categories can be discovered. Second, the relative displacement of objects influences their saliency remarkably, but current models cannot describe them explicitly. To solve these problems, this paper proposes weakly supervised fixations prediction, which leverages image labels to improve accuracy of human fixations prediction. The proposed model hierarchically discovers objects as well as their spatial configurations. Starting from the raw image pixels, we sample superpixels in an image, thereby seamless object descriptors termed object-level graphlets (oGLs) are generated by random walking on the superpixel mosaic. Then, a manifold embedding algorithm is proposed to encode image labels into oGLs, and the response map of each prespecified object is computed accordingly. On the basis of the object-level response map, we propose spatial-level graphlets (sGLs) to model the relative positions among objects. Afterward, eye tracking data is employed to integrate these sGLs for predicting human eye fixations. Thorough experiment results demonstrate the advantage of the proposed method over the state-of-the-art.	en_US
dc.relation.ispartof	IEEE Transactions on Cybernetics	en_US
dc.relation.isbasedon	10.1109/TCYB.2015.2400821	en_US
dc.subject.classification	Artificial Intelligence & Image Processing	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Models, Statistical	en_US
dc.subject.mesh	Psychophysics	en_US
dc.subject.mesh	Fixation, Ocular	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Image Processing, Computer-Assisted	en_US
dc.subject.mesh	Supervised Machine Learning	en_US
dc.title	Weakly supervised human fixations prediction	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	46	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	0102 Applied Mathematics	en_US
utslib.for	0906 Electrical and Electronic Engineering	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	46	en_US

Abstract:

© 2015 IEEE. Automatically predicting human eye fixations is a useful technique that can facilitate many multimedia applications, e.g., image retrieval, action recognition, and photo retargeting. Conventional approaches are frustrated by two drawbacks. First, psychophysical experiments show that an object-level interpretation of scenes influences eye movements significantly. Most of the existing saliency models rely on object detectors, and therefore, only a few prespecified categories can be discovered. Second, the relative displacement of objects influences their saliency remarkably, but current models cannot describe them explicitly. To solve these problems, this paper proposes weakly supervised fixations prediction, which leverages image labels to improve accuracy of human fixations prediction. The proposed model hierarchically discovers objects as well as their spatial configurations. Starting from the raw image pixels, we sample superpixels in an image, thereby seamless object descriptors termed object-level graphlets (oGLs) are generated by random walking on the superpixel mosaic. Then, a manifold embedding algorithm is proposed to encode image labels into oGLs, and the response map of each prespecified object is computed accordingly. On the basis of the object-level response map, we propose spatial-level graphlets (sGLs) to model the relative positions among objects. Afterward, eye tracking data is employed to integrate these sGLs for predicting human eye fixations. Thorough experiment results demonstrate the advantage of the proposed method over the state-of-the-art.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121581