Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points.

Liu, Z; Lu, J; Xuan, J; Zhang, G

Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points.

Liu, Z Lu, J

Xuan, J

Zhang, G

Permalink

Publisher:: Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:: Journal Article
Citation:: IEEE Trans Cybern, 2024, PP, pp. 1-14
Issue Date:: 2024-02-13

Embargoed

	Filename	Description	Size
	Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points.pdf	Accepted version	8.27 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Embargoed
Open Access

This item is currently unavailable due to the publisher's embargo.

The embargo period expires on 13 Feb 2026

Full metadata record

Field	Value	Language
dc.contributor.author	Liu, Z
dc.contributor.author	Lu, J https://orcid.org/0000-0003-0690-4732
dc.contributor.author	Xuan, J https://orcid.org/0000-0002-8367-6908
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-3960-0583
dc.date.accessioned	2024-02-29T09:55:34Z
dc.date.available	2024-02-29T09:55:34Z
dc.date.issued	2024-02-13
dc.identifier.citation	IEEE Trans Cybern, 2024, PP, pp. 1-14
dc.identifier.issn	2168-2267
dc.identifier.issn	2168-2275
dc.identifier.uri	http://hdl.handle.net/10453/175991
dc.description.abstract	Deep reinforcement learning (DRL) is a powerful tool for learning from interactions within a stationary environment where state transition and reward distributions remain constant throughout the process. Addressing the practical but challenging nonstationary environments with time-varying state transition or reward function changes during the interactions, ingenious solutions are essential for the stability and robustness of DRL agents. A key assumption to cope with nonstationary environments is that the change points between the previous and the new environments are known beforehand. Unfortunately, this assumption is impractical in many cases, such as outdoor robots and online recommendations. To address this problem, this article presents a robust DRL algorithm for nonstationary environments with unknown change points. The algorithm actively detects change points by monitoring the joint distribution of states and actions. A detection boosted, gradient-constrained optimization method then adapts the training of the current policy with the supporting knowledge of formerly well-trained policies. The previous policies and experience help the current policy adapt rapidly to environmental changes. Experiments show that the proposed method accumulates the highest reward among several alternatives and is the fastest to adapt to new environments. This work has compelling potential for increasing the environmental suitability of intelligent agents, such as drones, autonomous vehicles, and underwater robots.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)
dc.relation	http://purl.org/au-research/grants/arc/FL190100149
dc.relation.ispartof	IEEE Trans Cybern
dc.relation.isbasedon	10.1109/TCYB.2024.3356981
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.title	Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points.
dc.type	Journal Article
utslib.citation.volume	PP
utslib.location.activity	United States
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	embargoed	*
utslib.copyright.embargo	2026-02-13T00:00:00+1000Z
dc.date.updated	2024-02-29T09:55:29Z
pubs.publication-status	Published online
pubs.volume	PP

Abstract:

Deep reinforcement learning (DRL) is a powerful tool for learning from interactions within a stationary environment where state transition and reward distributions remain constant throughout the process. Addressing the practical but challenging nonstationary environments with time-varying state transition or reward function changes during the interactions, ingenious solutions are essential for the stability and robustness of DRL agents. A key assumption to cope with nonstationary environments is that the change points between the previous and the new environments are known beforehand. Unfortunately, this assumption is impractical in many cases, such as outdoor robots and online recommendations. To address this problem, this article presents a robust DRL algorithm for nonstationary environments with unknown change points. The algorithm actively detects change points by monitoring the joint distribution of states and actions. A detection boosted, gradient-constrained optimization method then adapts the training of the current policy with the supporting knowledge of formerly well-trained policies. The previous policies and experience help the current policy adapt rapidly to environmental changes. Experiments show that the proposed method accumulates the highest reward among several alternatives and is the fastest to adapt to new environments. This work has compelling potential for increasing the environmental suitability of intelligent agents, such as drones, autonomous vehicles, and underwater robots.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/175991