AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection

Li, F; Zheng, J; Zhang, YF; Liu, N; Jia, W

AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection

Li, F Zheng, J Zhang, YF Liu, N Jia, W

Permalink

Publisher:: Elsevier BV
Publication Type:: Journal Article
Citation:: Neurocomputing, 2021, 465, pp. 141-156
Issue Date:: 2021-11-20

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 20 Nov 2023

Download Accepted ManuscriptAdobe PDF (14.42 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, F
dc.contributor.author	Zheng, J
dc.contributor.author	Zhang, YF
dc.contributor.author	Liu, N
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338
dc.date.accessioned	2021-11-08T23:59:56Z
dc.date.available	2021-11-08T23:59:56Z
dc.date.issued	2021-11-20
dc.identifier.citation	Neurocomputing, 2021, 465, pp. 141-156
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/151414
dc.description.abstract	Effective exploration of useful contextual information in multi-modal images is an essential task in salient object detection. Nevertheless, the existing methods based on the early-fusion or the late-fusion schemes cannot address this problem as they are unable to effectively resolve the distribution gap and information loss. In this paper, we propose an adaptive multi-level deformable fusion network (AMDFNet) to exploit the cross-modality information. We use a cross-modality deformable convolution module to dynamically adjust the boundaries of salient objects by exploring the extra input from another modality. This enables incorporating the existing features and propagating more contexts so as to strengthen the model's ability to perceiving scenes. To accurately refine the predicted maps, a multi-scaled feature refinement module is proposed to enhance the intermediate features with multi-level prediction in the decoder part. Furthermore, we introduce a selective cross-modality attention module in the fusion process to exploit the attention mechanism. This module captures dense long-range cross-modality dependencies from a multi-modal hierarchical feature's perspective. This strategy enables the network to select more informative details and suppress the contamination caused by the negative depth maps. Experimental results on eight benchmark datasets demonstrate the effectiveness of the components in our proposed model, as well as the overall saliency model.
dc.language	en
dc.publisher	Elsevier BV
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2021.08.116
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection
dc.type	Journal Article
utslib.citation.volume	465
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2023-11-20T00:00:00+1000Z
dc.date.updated	2021-11-08T23:59:46Z
pubs.publication-status	Accepted
pubs.volume	465

Abstract:

Effective exploration of useful contextual information in multi-modal images is an essential task in salient object detection. Nevertheless, the existing methods based on the early-fusion or the late-fusion schemes cannot address this problem as they are unable to effectively resolve the distribution gap and information loss. In this paper, we propose an adaptive multi-level deformable fusion network (AMDFNet) to exploit the cross-modality information. We use a cross-modality deformable convolution module to dynamically adjust the boundaries of salient objects by exploring the extra input from another modality. This enables incorporating the existing features and propagating more contexts so as to strengthen the model's ability to perceiving scenes. To accurately refine the predicted maps, a multi-scaled feature refinement module is proposed to enhance the intermediate features with multi-level prediction in the decoder part. Furthermore, we introduce a selective cross-modality attention module in the fusion process to exploit the attention mechanism. This module captures dense long-range cross-modality dependencies from a multi-modal hierarchical feature's perspective. This strategy enables the network to select more informative details and suppress the contamination caused by the negative depth maps. Experimental results on eight benchmark datasets demonstrate the effectiveness of the components in our proposed model, as well as the overall saliency model.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/151414