Visual saliency prediction for stereoscopic image

Cheng, Hao

Visual saliency prediction for stereoscopic image

Cheng, Hao

Permalink

Publication Type:: Thesis
Issue Date:: 2018

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download contents and abstractAdobe PDF (111.11 kB)

Adobe PDF

Download thesisAdobe PDF (2.13 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Cheng, Hao
dc.date.accessioned	2018-10-08T03:37:10Z
dc.date.available	2018-10-08T03:37:10Z
dc.date.issued	2018
dc.identifier.uri	http://hdl.handle.net/10453/127984
dc.description	University of Technology Sydney. Faculty of Engineering and Information Technology.	en_AU
dc.description.abstract	Saliency prediction is considered to be key to attentional processing. Attention improves learning and survival by compelling creatures to focus their limited cognitive resources and perceptive abilities on the most interesting region of the available sensory data. Computational models for saliency prediction are widely used in various fields of computer vision, such as object detection, scene recognition, and robot vision. In recent years, several comprehensive and well-performing models have been developed. However, these models are only suitable for 2D content. With the rapid development of 3D imaging technology, an increasing number of applications are emerging that rely on 3D images and video. In turn, demand for computational saliency models that can handle 3D content is growing. Compared to the significant progress in 2D saliency research, studies that consider depth factor as part of stereoscopic saliency analysis are rather limited. Thus, the role depth factor in stereoscopic saliency analysis is still relatively unexplored. The aim of this thesis is to fill this gap in the literature by exploring the role of depth factors in three aspects of stereoscopic saliency: how depth factors might be used to leverage stereoscopic saliency detection; how to build a stereoscopic saliency model based on the mechanisms of human stereoscopic vision; and how to implement a stereoscopic saliency model that can adjust to the particular aspect of human stereoscopic vision reflected in specific 3D content. To meet these three aims, this thesis includes three distinct computation models for stereoscopic saliency prediction based on the past and present outcomes of my research. The contributions of the thesis are as follows: Chapter 3 presents a preliminary saliency model for stereoscopic images. This model exploits depth information and treats the depth factor of an image as a weight to leverage saliency analysis. First, low-level features from the color and depth maps are extracted. Then, to extract the structural information from the depth map, the surrounding Boolean-based map is computed as a weight to enhance the low-level features. Lastly, a stereoscopic center prior enhancement based on the saliency probability distribution in the depth map is used to determine the final saliency. The model presented in Chapter 4 predicts stereoscopic visual saliency using stereo contrast and stereo focus. The stereo contrast submodel measures stereo saliency based on color, depth contrast, and the pop-out effect. The stereo focus submodel measures the degree of focus based on monocular vision and comfort zones. Multi-scale fusion is then used to generate a map for each of the submodels, and a Bayesian integration scheme combines both maps into a stereo saliency map. However, the stereoscopic saliency model presented in Chapter 4 does not explain all the phenomena in stereoscopic content. So, to improve the models robustness, Chapter 5 includes a computational model for stereoscopic 3D visual saliency with three submodels based on the three mechanisms of the human vision system: the pop-out effect, comfort zones, and the background effect. Each mechanism provides useful cues for stereoscopic saliency analysis depending on the nature of the stereoscopic content. Hence, the model in Chapter 5 incorporates a selection strategy to accurately determine which submodel should be used to process an image. The approach is implemented within a purpose-built, multi-feature analysis framework that assesses three features: surrounding region, color and depth contrast, and points of interest. All three models were verified through experiments with two eye-tracking databases. Each outperforms the state-of-the-art saliency models.	en_AU
dc.format	Thesis (PhD)
dc.language.iso	en_AU	en_AU
dc.relation	https://opus.lib.uts.edu.au/bitstream/10453/127984/2/02whole.pdf
dc.rights	The author owns the copyright in this thesis including all reproduction and reuse rights for the work. The work may not be altered without the permission of the copyright owner. Attribution is essential when quoting or paraphrasing from this thesis.
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	au.edu.uts.lib/ppc
dc.subject	Saliency prediction.	en_AU
dc.subject	Stereoscopic saliency model.	en_AU
dc.subject	Computational saliency model.	en_AU
dc.subject	Stereoscopic 3D visual saliency.	en_AU
dc.title	Visual saliency prediction for stereoscopic image	en_AU
dc.type	Thesis	en_AU
utslib.copyright.status	open_access

Abstract:

Saliency prediction is considered to be key to attentional processing. Attention improves learning and survival by compelling creatures to focus their limited cognitive resources and perceptive abilities on the most interesting region of the available sensory data. Computational models for saliency prediction are widely used in various fields of computer vision, such as object detection, scene recognition, and robot vision. In recent years, several comprehensive and well-performing models have been developed. However, these models are only suitable for 2D content. With the rapid development of 3D imaging technology, an increasing number of applications are emerging that rely on 3D images and video. In turn, demand for computational saliency models that can handle 3D content is growing. Compared to the significant progress in 2D saliency research, studies that consider depth factor as part of stereoscopic saliency analysis are rather limited. Thus, the role depth factor in stereoscopic saliency analysis is still relatively unexplored. The aim of this thesis is to fill this gap in the literature by exploring the role of depth factors in three aspects of stereoscopic saliency: how depth factors might be used to leverage stereoscopic saliency detection; how to build a stereoscopic saliency model based on the mechanisms of human stereoscopic vision; and how to implement a stereoscopic saliency model that can adjust to the particular aspect of human stereoscopic vision reflected in specific 3D content. To meet these three aims, this thesis includes three distinct computation models for stereoscopic saliency prediction based on the past and present outcomes of my research. The contributions of the thesis are as follows: Chapter 3 presents a preliminary saliency model for stereoscopic images. This model exploits depth information and treats the depth factor of an image as a weight to leverage saliency analysis. First, low-level features from the color and depth maps are extracted. Then, to extract the structural information from the depth map, the surrounding Boolean-based map is computed as a weight to enhance the low-level features. Lastly, a stereoscopic center prior enhancement based on the saliency probability distribution in the depth map is used to determine the final saliency. The model presented in Chapter 4 predicts stereoscopic visual saliency using stereo contrast and stereo focus. The stereo contrast submodel measures stereo saliency based on color, depth contrast, and the pop-out effect. The stereo focus submodel measures the degree of focus based on monocular vision and comfort zones. Multi-scale fusion is then used to generate a map for each of the submodels, and a Bayesian integration scheme combines both maps into a stereo saliency map. However, the stereoscopic saliency model presented in Chapter 4 does not explain all the phenomena in stereoscopic content. So, to improve the models robustness, Chapter 5 includes a computational model for stereoscopic 3D visual saliency with three submodels based on the three mechanisms of the human vision system: the pop-out effect, comfort zones, and the background effect. Each mechanism provides useful cues for stereoscopic saliency analysis depending on the nature of the stereoscopic content. Hence, the model in Chapter 5 incorporates a selection strategy to accurately determine which submodel should be used to process an image. The approach is implemented within a purpose-built, multi-feature analysis framework that assesses three features: surrounding region, color and depth contrast, and points of interest. All three models were verified through experiments with two eye-tracking databases. Each outperforms the state-of-the-art saliency models.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/127984