Exploring viewport features for semi-supervised saliency prediction in omnidirectional images

Huang, M; Li, G; Liu, Z; Wu, Y; Gong, C; Zhu, L; Yang, Y

Exploring viewport features for semi-supervised saliency prediction in omnidirectional images

Huang, M Li, G Liu, Z Wu, Y Gong, C Zhu, L

Yang, Y

Permalink

Publisher:: ELSEVIER
Publication Type:: Journal Article
Citation:: Image and Vision Computing, 2023, 129
Issue Date:: 2023-01-01

Closed Access

	Filename	Description	Size
	Exploring viewport features for semi-supervised saliency prediction in omnidirectional images.pdf	Published version	1.78 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Huang, M
dc.contributor.author	Li, G
dc.contributor.author	Liu, Z
dc.contributor.author	Wu, Y
dc.contributor.author	Gong, C
dc.contributor.author	Zhu, L https://orcid.org/0000-0002-4093-7557
dc.contributor.author	Yang, Y https://orcid.org/0000-0002-0512-880X
dc.date.accessioned	2024-02-01T03:18:07Z
dc.date.available	2024-02-01T03:18:07Z
dc.date.issued	2023-01-01
dc.identifier.citation	Image and Vision Computing, 2023, 129
dc.identifier.issn	0262-8856
dc.identifier.issn	1872-8138
dc.identifier.uri	http://hdl.handle.net/10453/175204
dc.description.abstract	Compared with the annotated data for the 2D image saliency prediction task, the annotated data for training omnidirectional image (or 360° image) saliency prediction models are not sufficient. Most existing fully-supervised saliency prediction methods for omnidirectional images (ODIs) adopt a scheme, first training the methods on a labeled large 2D image saliency prediction dataset and then fine-tuning the methods on the labeled tiny ODI saliency prediction dataset. However, this strategy is time-consuming and may not inadequately mine the visual features built in ODIs. To explore the visual attributes targeted at ODIs and address the shortage of labels on these ODIs, in this paper, we propose an end-to-end semi-supervised network, namely VFNet, which relies on viewport features and only utilizes ODIs as training data, for ODI saliency prediction. Concretely, we adopt consistency regularization as our semi-supervised learning framework. The predictions between main and auxiliary saliency inference networks in the VFNet enforce consistency. Aiming at ODIs, we introduce a new form of perturbation, i.e., DropView, to improve the effectiveness of consistency regularization. By randomly dropping out different 360° cubemap viewport features before the auxiliary saliency inference network, the proposed DropView enhances the robustness of the final ODI saliency prediction. To adaptively interact with the equirectangular and different cubemap viewport features according to their contributions, we introduce a Viewport Feature Adaptive Integration (VFAI) module and deploy the VFAI module at different levels in the VFNet to raise the capacity of feature encoding of our VFNet. Compared with state-of-the-art fully-supervised methods, our VFNet with fewer labeled training data achieves competitive performance demonstrated by extensive experiments on two publicly available datasets.
dc.language	English
dc.publisher	ELSEVIER
dc.relation.ispartof	Image and Vision Computing
dc.relation.isbasedon	10.1016/j.imavis.2022.104590
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	4007 Control engineering, mechatronics and robotics
dc.subject.classification	4603 Computer vision and multimedia computation
dc.subject.classification	4611 Machine learning
dc.title	Exploring viewport features for semi-supervised saliency prediction in omnidirectional images
dc.type	Journal Article
utslib.citation.volume	129
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
utslib.copyright.status	closed_access	*
dc.date.updated	2024-02-01T03:18:07Z
pubs.publication-status	Published
pubs.volume	129

Abstract:

Compared with the annotated data for the 2D image saliency prediction task, the annotated data for training omnidirectional image (or 360° image) saliency prediction models are not sufficient. Most existing fully-supervised saliency prediction methods for omnidirectional images (ODIs) adopt a scheme, first training the methods on a labeled large 2D image saliency prediction dataset and then fine-tuning the methods on the labeled tiny ODI saliency prediction dataset. However, this strategy is time-consuming and may not inadequately mine the visual features built in ODIs. To explore the visual attributes targeted at ODIs and address the shortage of labels on these ODIs, in this paper, we propose an end-to-end semi-supervised network, namely VFNet, which relies on viewport features and only utilizes ODIs as training data, for ODI saliency prediction. Concretely, we adopt consistency regularization as our semi-supervised learning framework. The predictions between main and auxiliary saliency inference networks in the VFNet enforce consistency. Aiming at ODIs, we introduce a new form of perturbation, i.e., DropView, to improve the effectiveness of consistency regularization. By randomly dropping out different 360° cubemap viewport features before the auxiliary saliency inference network, the proposed DropView enhances the robustness of the final ODI saliency prediction. To adaptively interact with the equirectangular and different cubemap viewport features according to their contributions, we introduce a Viewport Feature Adaptive Integration (VFAI) module and deploy the VFAI module at different levels in the VFNet to raise the capacity of feature encoding of our VFNet. Compared with state-of-the-art fully-supervised methods, our VFNet with fewer labeled training data achieves competitive performance demonstrated by extensive experiments on two publicly available datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175204