Weakly supervised human fixations prediction

Publication Type:
Journal Article
IEEE Transactions on Cybernetics, 2016, 46 (1), pp. 258 - 269
Issue Date:
Filename Description Size
07152897.pdfPublished Version1.96 MB
Adobe PDF
Full metadata record
© 2015 IEEE. Automatically predicting human eye fixations is a useful technique that can facilitate many multimedia applications, e.g., image retrieval, action recognition, and photo retargeting. Conventional approaches are frustrated by two drawbacks. First, psychophysical experiments show that an object-level interpretation of scenes influences eye movements significantly. Most of the existing saliency models rely on object detectors, and therefore, only a few prespecified categories can be discovered. Second, the relative displacement of objects influences their saliency remarkably, but current models cannot describe them explicitly. To solve these problems, this paper proposes weakly supervised fixations prediction, which leverages image labels to improve accuracy of human fixations prediction. The proposed model hierarchically discovers objects as well as their spatial configurations. Starting from the raw image pixels, we sample superpixels in an image, thereby seamless object descriptors termed object-level graphlets (oGLs) are generated by random walking on the superpixel mosaic. Then, a manifold embedding algorithm is proposed to encode image labels into oGLs, and the response map of each prespecified object is computed accordingly. On the basis of the object-level response map, we propose spatial-level graphlets (sGLs) to model the relative positions among objects. Afterward, eye tracking data is employed to integrate these sGLs for predicting human eye fixations. Thorough experiment results demonstrate the advantage of the proposed method over the state-of-the-art.
Please use this identifier to cite or link to this item: