SwiMDiff: Scene-Wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

Tian, J; Lei, J; Zhang, J; Xie, W; Li, Y

SwiMDiff: Scene-Wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image

Tian, J Lei, J

Zhang, J Xie, W Li, Y

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Transactions on Geoscience and Remote Sensing, 2024, 62, pp. 1-13
Issue Date:: 2024-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

The embargo period expires on 28 Feb 2026

Adobe PDF

Download Accepted versionAdobe PDF (5.85 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Tian, J
dc.contributor.author	Lei, J https://orcid.org/0000-0003-0851-6565
dc.contributor.author	Zhang, J
dc.contributor.author	Xie, W
dc.contributor.author	Li, Y
dc.date.accessioned	2024-04-29T01:08:22Z
dc.date.available	2024-04-29T01:08:22Z
dc.date.issued	2024-01-01
dc.identifier.citation	IEEE Transactions on Geoscience and Remote Sensing, 2024, 62, pp. 1-13
dc.identifier.issn	0196-2892
dc.identifier.issn	1558-0644
dc.identifier.uri	http://hdl.handle.net/10453/178435
dc.description.abstract	With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. First, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Second, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pretraining framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. In addition, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.
dc.language	en
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Transactions on Geoscience and Remote Sensing
dc.relation.isbasedon	10.1109/TGRS.2024.3371481
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	0404 Geophysics, 0906 Electrical and Electronic Engineering, 0909 Geomatic Engineering
dc.subject.classification	Geological & Geomatics Engineering
dc.subject.classification	37 Earth sciences
dc.subject.classification	40 Engineering
dc.title	SwiMDiff: Scene-Wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image
dc.type	Journal Article
utslib.citation.volume	62
utslib.for	0404 Geophysics
utslib.for	0906 Electrical and Electronic Engineering
utslib.for	0909 Geomatic Engineering
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	open_access	*
utslib.copyright.embargo	2026-02-28T00:00:00+1000Z
dc.date.updated	2024-04-29T01:08:21Z
pubs.publication-status	Published
pubs.volume	62

Abstract:

With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. First, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Second, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pretraining framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. In addition, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/178435