Deep Reinforcement Learning in Nonstationary Environments With Unknown Change Points.

Publisher:
Institute of Electrical and Electronics Engineers (IEEE)
Publication Type:
Journal Article
Citation:
IEEE Trans Cybern, 2024, PP, pp. 1-14
Issue Date:
2024-02-13
Full metadata record
Deep reinforcement learning (DRL) is a powerful tool for learning from interactions within a stationary environment where state transition and reward distributions remain constant throughout the process. Addressing the practical but challenging nonstationary environments with time-varying state transition or reward function changes during the interactions, ingenious solutions are essential for the stability and robustness of DRL agents. A key assumption to cope with nonstationary environments is that the change points between the previous and the new environments are known beforehand. Unfortunately, this assumption is impractical in many cases, such as outdoor robots and online recommendations. To address this problem, this article presents a robust DRL algorithm for nonstationary environments with unknown change points. The algorithm actively detects change points by monitoring the joint distribution of states and actions. A detection boosted, gradient-constrained optimization method then adapts the training of the current policy with the supporting knowledge of formerly well-trained policies. The previous policies and experience help the current policy adapt rapidly to environmental changes. Experiments show that the proposed method accumulates the highest reward among several alternatives and is the fastest to adapt to new environments. This work has compelling potential for increasing the environmental suitability of intelligent agents, such as drones, autonomous vehicles, and underwater robots.
Please use this identifier to cite or link to this item: