False Correlation Reduction for Offline Reinforcement Learning.

Deng, Z; Fu, Z; Wang, L; Yang, Z; Bai, C; Zhou, T; Wang, Z; Jiang, J

False Correlation Reduction for Offline Reinforcement Learning.

Deng, Z

Fu, Z Wang, L Yang, Z Bai, C Zhou, T Wang, Z Jiang, J

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Trans Pattern Anal Mach Intell, 2023, PP, (99), pp. 1-12
Issue Date:: 2023-10-30

Embargoed

	Filename	Description	Size
	False Correlation Reduction for Offline Reinforcement Learning.pdf	Accepted version	12.29 MB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 30 Oct 2025

Full metadata record

Field	Value	Language
dc.contributor.author	Deng, Z https://orcid.org/0000-0002-6088-7534
dc.contributor.author	Fu, Z
dc.contributor.author	Wang, L
dc.contributor.author	Yang, Z
dc.contributor.author	Bai, C
dc.contributor.author	Zhou, T
dc.contributor.author	Wang, Z
dc.contributor.author	Jiang, J
dc.date.accessioned	2024-01-09T23:49:22Z
dc.date.available	2024-01-09T23:49:22Z
dc.date.issued	2023-10-30
dc.identifier.citation	IEEE Trans Pattern Anal Mach Intell, 2023, PP, (99), pp. 1-12
dc.identifier.issn	0162-8828
dc.identifier.issn	1939-3539
dc.identifier.uri	http://hdl.handle.net/10453/174170
dc.description.abstract	Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation.ispartof	IEEE Trans Pattern Anal Mach Intell
dc.relation.isbasedon	10.1109/TPAMI.2023.3328397
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 0806 Information Systems, 0906 Electrical and Electronic Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	4603 Computer vision and multimedia computation
dc.subject.classification	4611 Machine learning
dc.title	False Correlation Reduction for Offline Reinforcement Learning.
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	0806 Information Systems
utslib.for	0906 Electrical and Electronic Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2025-10-30T00:00:00+1000Z
dc.date.updated	2024-01-09T23:49:17Z
pubs.issue	99
pubs.publication-status	Published online
pubs.volume	PP
utslib.citation.issue	99

Abstract:

Offline reinforcement learning (RL) harnesses the power of massive datasets for resolving sequential decision problems. Most existing papers only discuss defending against out-of-distribution (OOD) actions while we investigate a broader issue, the false correlations between epistemic uncertainty and decision-making, an essential factor that causes suboptimality. In this paper, we propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm. We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL). The proposed algorithm introduces an annealing behavior cloning regularizer to help produce a high-quality estimation of uncertainty which is critical for eliminating false correlations from suboptimality. Theoretically, we justify the rationality of the proposed method and prove its convergence to the optimal policy with a sublinear rate under mild assumptions.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/174170