Steering approaches to Pareto-optimal multiobjective reinforcement learning

Vamplew, P; Issabekov, R; Dazeley, R; Foale, C; Berry, A; Moore, T; Creighton, D

Steering approaches to Pareto-optimal multiobjective reinforcement learning

Vamplew, P Issabekov, R Dazeley, R Foale, C Berry, A

Moore, T Creighton, D

Permalink

Publisher:: Elsevier
Publication Type:: Journal Article
Citation:: Neurocomputing, 2017, 263, pp. 26-38
Issue Date:: 2017-11-08

Closed Access

	Filename	Description	Size
	1-s2.0-S0925231217311013-main.pdf		2.51 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Vamplew, P
dc.contributor.author	Issabekov, R
dc.contributor.author	Dazeley, R
dc.contributor.author	Foale, C
dc.contributor.author	Berry, A https://orcid.org/0000-0002-2499-9491
dc.contributor.author	Moore, T
dc.contributor.author	Creighton, D
dc.date.accessioned	2022-07-14T02:36:31Z
dc.date.available	2022-07-14T02:36:31Z
dc.date.issued	2017-11-08
dc.identifier.citation	Neurocomputing, 2017, 263, pp. 26-38
dc.identifier.issn	0925-2312
dc.identifier.issn	1872-8286
dc.identifier.uri	http://hdl.handle.net/10453/158895
dc.description.abstract	For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent's target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.
dc.language	English
dc.publisher	Elsevier
dc.relation.ispartof	Neurocomputing
dc.relation.isbasedon	10.1016/j.neucom.2016.08.152
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	08 Information and Computing Sciences, 09 Engineering, 17 Psychology and Cognitive Sciences
dc.subject.classification	Artificial Intelligence & Image Processing
dc.title	Steering approaches to Pareto-optimal multiobjective reinforcement learning
dc.type	Journal Article
utslib.citation.volume	263
utslib.for	08 Information and Computing Sciences
utslib.for	09 Engineering
utslib.for	17 Psychology and Cognitive Sciences
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/A/DRsch The Data Science Institute
utslib.copyright.status	closed_access	*
pubs.consider-herdc	false
dc.date.updated	2022-07-14T02:36:30Z
pubs.publication-status	Published
pubs.volume	263

Abstract:

For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent's target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/158895