Error controlled actor-critic

Gao, X; Chao, F; Zhou, C; Ge, Z; Yang, L; Chang, X; Shang, C; Shen, Q

Error controlled actor-critic

Gao, X Chao, F Zhou, C Ge, Z Yang, L Chang, X Shang, C Shen, Q

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Information Sciences, 2022, 612, pp. 62-74
Issue Date:: 2022-10

Closed Access

	Filename	Description	Size
	1-s2.0-S0020025522009896-main.pdf	Published version	2.09 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Gao, X
dc.contributor.author	Chao, F
dc.contributor.author	Zhou, C
dc.contributor.author	Ge, Z
dc.contributor.author	Yang, L
dc.contributor.author	Chang, X
dc.contributor.author	Shang, C
dc.contributor.author	Shen, Q
dc.date.accessioned	2023-04-19T20:39:52Z
dc.date.available	2023-04-19T20:39:52Z
dc.date.issued	2022-10
dc.identifier.citation	Information Sciences, 2022, 612, pp. 62-74
dc.identifier.issn	0020-0255
dc.identifier.uri	http://hdl.handle.net/10453/169985
dc.description.abstract	The approximation inaccuracy of the value function in reinforcement learning (RL) algorithms unavoidably leads to an overestimation phenomenon, which has negative effects on the convergence of the algorithms. To limit the negative effects of the approximation error, we propose error controlled actor-critic (ECAC) which ensures the approximation error is limited within the value function. We present an investigation of how approximation inaccuracy can impair the optimization process of actor-critic approaches. In addition, we derive an upper bound for the approximation error of the Q function approximator and discover that the error can be reduced by limiting the KL- divergence between every two consecutive policies during policy training. Experiments on a variety of continuous control tasks demonstrate that the proposed actor-critic approach decreases approximation error and outperforms previous model-free RL algorithms by a significant margin.
dc.language	en
dc.publisher	Elsevier
dc.relation.ispartof	Information Sciences
dc.relation.isbasedon	10.1016/j.ins.2022.08.079
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	01 Mathematical Sciences, 08 Information and Computing Sciences, 09 Engineering
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Error controlled actor-critic
dc.type	Journal Article
utslib.citation.volume	612
utslib.for	01 Mathematical Sciences
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2023-04-19T20:39:50Z
pubs.publication-status	Published
pubs.volume	612

Abstract:

The approximation inaccuracy of the value function in reinforcement learning (RL) algorithms unavoidably leads to an overestimation phenomenon, which has negative effects on the convergence of the algorithms. To limit the negative effects of the approximation error, we propose error controlled actor-critic (ECAC) which ensures the approximation error is limited within the value function. We present an investigation of how approximation inaccuracy can impair the optimization process of actor-critic approaches. In addition, we derive an upper bound for the approximation error of the Q function approximator and discover that the error can be reduced by limiting the KL- divergence between every two consecutive policies during policy training. Experiments on a variety of continuous control tasks demonstrate that the proposed actor-critic approach decreases approximation error and outperforms previous model-free RL algorithms by a significant margin.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/169985