Pyramidal Attention for Saliency Detection

Hussain, T; Anwar, A; Anwar, S; Petersson, L; Baik, SW

Pyramidal Attention for Saliency Detection

Hussain, T Anwar, A Anwar, S Petersson, L Baik, SW

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Conference Proceeding
Citation:: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2877-2887
Issue Date:: 2022-06-20

Closed Access

	Filename	Description	Size
	Pyramidal_Attention_for_Saliency_Detection (1).pdf	Published version	8.14 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Hussain, T
dc.contributor.author	Anwar, A
dc.contributor.author	Anwar, S
dc.contributor.author	Petersson, L
dc.contributor.author	Baik, SW
dc.date	2022-06-19
dc.date.accessioned	2023-07-08T01:09:51Z
dc.date.available	2023-07-08T01:09:51Z
dc.date.issued	2022-06-20
dc.identifier.citation	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2877-2887
dc.identifier.isbn	978-1-6654-8739-9
dc.identifier.issn	2160-7508
dc.identifier.issn	2160-7516
dc.identifier.uri	http://hdl.handle.net/10453/171326
dc.description.abstract	Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model’s practical applicability. This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations and further enhance the subsequent ones. At each stage, the backbone transformer model produces global receptive fields and computing in parallel to attain fine-grained global predictions refined by our residual convolutional attention decoder for optimal saliency prediction. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets, respectively. Consequently, we present a new SOD perspective of generating RGB-D SOD without acquiring depth data during training and testing and assist RGB methods with depth clues for improved performance. The code and trained models are available at https://github.com/tanveer-hussain/EfficientSOD2
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
dc.relation.ispartof	IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
dc.relation.isbasedon	10.1109/cvprw56347.2022.00325
dc.rights	info:eu-repo/semantics/closedAccess
dc.title	Pyramidal Attention for Saliency Detection
dc.type	Conference Proceeding
utslib.citation.volume	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
utslib.location.activity	New Orleans, LA, USA
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2023-07-08T01:09:47Z
pubs.finish-date	2022-06-20
pubs.place-of-publication	Piscataway, USA
pubs.publication-status	Published
pubs.start-date	2022-06-19
pubs.volume	2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
dc.location	Piscataway, USA

Abstract:

Salient object detection (SOD) extracts meaningful contents from an input image. RGB-based SOD methods lack the complementary depth clues; hence, providing limited performance for complex scenarios. Similarly, RGB-D models process RGB and depth inputs, but the depth data availability during testing may hinder the model’s practical applicability. This paper exploits only RGB images, estimates depth from RGB, and leverages the intermediate depth features. We employ a pyramidal attention structure to extract multi-level convolutional-transformer features to process initial stage representations and further enhance the subsequent ones. At each stage, the backbone transformer model produces global receptive fields and computing in parallel to attain fine-grained global predictions refined by our residual convolutional attention decoder for optimal saliency prediction. We report significantly improved performance against 21 and 40 state-of-the-art SOD methods on eight RGB and RGB-D datasets, respectively. Consequently, we present a new SOD perspective of generating RGB-D SOD without acquiring depth data during training and testing and assist RGB methods with depth clues for improved performance. The code and trained models are available at https://github.com/tanveer-hussain/EfficientSOD2

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/171326