RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Zhang, J; Fan, D-P; Dai, Y; Yu, X; Zhong, Y; Barnes, N; Shao, L

RGB-D Saliency Detection via Cascaded Mutual Information Minimization

Zhang, J Fan, D-P Dai, Y Yu, X

Zhong, Y Barnes, N Shao, L

Permalink

Publisher:: IEEE
Publication Type:: Conference Proceeding
Citation:: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2022, 00, pp. 4318-4327
Issue Date:: 2022-02-28

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted versionAdobe PDF (1.3 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, J
dc.contributor.author	Fan, D-P
dc.contributor.author	Dai, Y
dc.contributor.author	Yu, X https://orcid.org/0000-0002-0269-5649
dc.contributor.author	Zhong, Y
dc.contributor.author	Barnes, N
dc.contributor.author	Shao, L
dc.date	2021-10-10
dc.date.accessioned	2022-04-11T22:46:11Z
dc.date.available	2022-04-11T22:46:11Z
dc.date.issued	2022-02-28
dc.identifier.citation	2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2022, 00, pp. 4318-4327
dc.identifier.uri	http://hdl.handle.net/10453/156085
dc.description.abstract	Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multistage cascaded learning framework via mutual information minimization to explicitly model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7× larger than NJU2K) COME15K dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at https://github.com/JingZhang617/cascaded_rgbd_sod.
dc.language	en
dc.publisher	IEEE
dc.relation.ispartof	2021 IEEE/CVF International Conference on Computer Vision (ICCV)
dc.relation.ispartof	2021 IEEE/CVF International Conference on Computer Vision
dc.relation.isbasedon	10.1109/iccv48922.2021.00430
dc.rights	info:eu-repo/semantics/openAccess
dc.title	RGB-D Saliency Detection via Cascaded Mutual Information Minimization
dc.type	Conference Proceeding
utslib.citation.volume	00
utslib.location.activity	Montreal, QC, Canada
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2022-04-11T22:46:10Z
pubs.finish-date	2021-10-17
pubs.publication-status	Published
pubs.start-date	2021-10-10
pubs.volume	00

Abstract:

Existing RGB-D saliency detection models do not explicitly encourage RGB and depth to achieve effective multi-modal learning. In this paper, we introduce a novel multistage cascaded learning framework via mutual information minimization to explicitly model the multi-modal information between RGB image and depth data. Specifically, we first map the feature of each mode to a lower dimensional feature vector, and adopt mutual information minimization as a regularizer to reduce the redundancy between appearance features from RGB and geometric features from depth. We then perform multi-stage cascaded learning to impose the mutual information minimization constraint at every stage of the network. Extensive experiments on benchmark RGB-D saliency datasets illustrate the effectiveness of our framework. Further, to prosper the development of this field, we contribute the largest (7× larger than NJU2K) COME15K dataset, which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations. Based on these rich labels, we additionally construct four new benchmarks with strong baselines and observe some interesting phenomena, which can motivate future model design. Source code and dataset are available at https://github.com/JingZhang617/cascaded_rgbd_sod.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/156085