Reward space noise for exploration in deep reinforcement learning

Sun, C; Wang, R; Li, Q; Hu, X

Reward space noise for exploration in deep reinforcement learning

Sun, C Wang, R Li, Q

Hu, X

Permalink

Publisher:: World Scientific Publishing
Publication Type:: Journal Article
Citation:: International Journal of Pattern Recognition and Artificial Intelligence, 2021, 35, (10), pp. 1-21
Issue Date:: 2021-08-01

Closed Access

	Filename	Description	Size
	19110414_7963428280005671.pdf		2.19 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Sun, C
dc.contributor.author	Wang, R
dc.contributor.author	Li, Q https://orcid.org/0000-0002-8308-9551
dc.contributor.author	Hu, X
dc.date.accessioned	2022-05-27T01:46:47Z
dc.date.available	2022-05-27T01:46:47Z
dc.date.issued	2021-08-01
dc.identifier.citation	International Journal of Pattern Recognition and Artificial Intelligence, 2021, 35, (10), pp. 1-21
dc.identifier.issn	0218-0014
dc.identifier.issn	1793-6381
dc.identifier.uri	http://hdl.handle.net/10453/157738
dc.description.abstract	A fundamental challenge for reinforcement learning (RL) is how to achieve effcient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally e±cient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent's understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive ε-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal.
dc.language	English
dc.publisher	World Scientific Publishing
dc.relation.ispartof	International Journal of Pattern Recognition and Artificial Intelligence
dc.relation.isbasedon	10.1142/S0218001421520133
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	0801 Artificial Intelligence and Image Processing, 1702 Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Reward space noise for exploration in deep reinforcement learning
dc.type	Journal Article
utslib.citation.volume	35
utslib.for	0801 Artificial Intelligence and Image Processing
utslib.for	1702 Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - AAI - Advanced Analytics Institute Research Centre
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2022-05-27T01:46:46Z
pubs.issue	10
pubs.publication-status	Published
pubs.volume	35
utslib.citation.issue	10

Abstract:

A fundamental challenge for reinforcement learning (RL) is how to achieve effcient exploration in initially unknown environments. Most state-of-the-art RL algorithms leverage action space noise to drive exploration. The classical strategies are computationally e±cient and straightforward to implement. However, these methods may fail to perform effectively in complex environments. To address this issue, we propose a novel strategy named reward space noise (RSN) for farsighted and consistent exploration in RL. By introducing the stochasticity from reward space, we are able to change agent's understanding about environment and perturb its behaviors. We find that the simple RSN can achieve consistent exploration and scale to complex domains without intensive computational cost. To demonstrate the effectiveness and scalability of the proposed method, we implement a deep Q-learning agent with reward noise and evaluate its exploratory performance on a set of Atari games which are challenging for the naive ε-greedy strategy. The results show that reward noise outperforms action noise in most games and performs comparably in others. Concretely, we found that in the early training, the best exploratory performance of reward noise is obviously better than action noise, which demonstrates that the reward noise can quickly explore the valuable states and aid in finding the optimal.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/157738