Rethinking feature aggregation for deep RGB-D salient object detection

Zhang, YF; Zheng, J; Li, L; Liu, N; Jia, W; Fan, X; Xu, C; He, X

Rethinking feature aggregation for deep RGB-D salient object detection

Zhang, YF Zheng, J Li, L Liu, N Jia, W

Fan, X Xu, C He, X

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Neurocomputing, 2021, 423, pp. 463-473
Issue Date:: 2021-01-29

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 29 Jan 2023

Download Accepted ManuscriptAdobe PDF (13.91 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Zhang, YF
dc.contributor.author	Zheng, J
dc.contributor.author	Li, L
dc.contributor.author	Liu, N
dc.contributor.author	Jia, W https://orcid.org/0000-0002-0940-3338
dc.contributor.author	Fan, X
dc.contributor.author	Xu, C
dc.contributor.author	He, X https://orcid.org/0000-0001-8962-540X
dc.date.accessioned	2021-02-06T10:59:03Z
dc.date.available	2021-02-06T10:59:03Z
dc.date.issued	2021-01-29
dc.identifier.citation	Neurocomputing, 2021, 423, pp. 463-473
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/145879
dc.description.abstract	© 2020 Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2020.10.079
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Rethinking feature aggregation for deep RGB-D salient object detection
dc.type	Journal Article
utslib.citation.volume	423
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CRIN - Realtime Information Networks
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	open_access	*
pubs.consider-herdc	true
utslib.copyright.embargo	2023-01-29T00:00:00+1000Z
dc.date.updated	2021-02-06T10:58:21Z
pubs.publication-status	Published
pubs.volume	423

Abstract:

© 2020 Two-stream UNet based architectures are widely used in deep RGB-D salient object detection (SOD) models. However, UNet only adopts a top-down decoder network to progressively aggregate high-level features with low-level ones. In this paper, we propose to enrich feature aggregation via holistic aggregation paths and an extra bottom-up decoder network. The former aggregates multi-level features holistically to learn abundant feature interactions while the latter aggregates improved low-level features with high-level features, thus promoting their representation ability. Aiming at the two-stream architecture, we propose another early aggregation scheme to aggregate and propagate multi-modal encoder features at each level, thereby improving the encoder capability. We also propose a factorized attention module to efficiently modulate the feature aggregation action for each feature node with multiple learned attention factors. Experimental results demonstrate that all of the proposed components can gradually improve RGB-D SOD results. Consequently, our final SOD model performs favorably against other state-of-the-art methods.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/145879