Bayesian deep reinforcement learning via deep kernel learning

Xuan, J; Lu, J; Yan, Z; Zhang, G

Bayesian deep reinforcement learning via deep kernel learning

Xuan, J

Lu, J

Yan, Z

Zhang, G

Permalink

Publication Type:: Journal Article
Citation:: International Journal of Computational Intelligence Systems, 2018, 12 (1), pp. 164 - 171
Issue Date:: 2018-11-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published VersionAdobe PDF (1.3 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xuan, J https://orcid.org/0000-0002-8367-6908	en_US
dc.contributor.author	Lu, J https://orcid.org/0000-0003-0690-4732	en_US
dc.contributor.author	Yan, Z https://orcid.org/0000-0003-3368-2100	en_US
dc.contributor.author	Zhang, G https://orcid.org/0000-0003-3960-0583	en_US
dc.date.issued	2018-11-01	en_US
dc.identifier.citation	International Journal of Computational Intelligence Systems, 2018, 12 (1), pp. 164 - 171	en_US
dc.identifier.issn	1875-6891	en_US
dc.identifier.uri	http://hdl.handle.net/10453/130175
dc.description.abstract	© 2018, the Authors. Reinforcement learning (RL) aims to resolve the sequential decision-making under uncertainty problem where an agent needs to interact with an unknown environment with the expectation of optimising the cumulative long-term reward. Many real-world problems could benefit from RL, e.g., industrial robotics, medical treatment, and trade execution. As a representative model-free RL algorithm, deep Q-network (DQN) has recently achieved great success on RL problems and even exceed the human performance through introducing deep neural networks. However, such classical deep neural network-based models cannot well handle the uncertainty in sequential decision-making and then limit their learning performance. In this paper, we propose a new model-free RL algorithm based on a Bayesian deep model. To be specific, deep kernel learning (i.e., a Gaussian process with deep kernel) is adopted to learn the hidden complex action-value function instead of classical deep learning models, which could encode more uncertainty and fully take advantage of the replay memory. The comparative experiments on standard RL testing platform, i.e., OpenAI-Gym, show that the proposed algorithm outweighs the DQN. Further investigations will be directed to applying RL for supporting dynamic decision-making in complex environments.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP170101632
dc.relation.ispartof	International Journal of Computational Intelligence Systems	en_US
dc.relation.isbasedon	10.2991/ijcis.2018.25905189	en_US
dc.title	Bayesian deep reinforcement learning via deep kernel learning	en_US
dc.type	Journal Article
utslib.citation.volume	1	en_US
utslib.citation.volume	12	en_US
utslib.for	0103 Numerical and Computational Mathematics	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	open_access
pubs.issue	1	en_US
pubs.publication-status	Published	en_US
pubs.volume	12	en_US

Abstract:

© 2018, the Authors. Reinforcement learning (RL) aims to resolve the sequential decision-making under uncertainty problem where an agent needs to interact with an unknown environment with the expectation of optimising the cumulative long-term reward. Many real-world problems could benefit from RL, e.g., industrial robotics, medical treatment, and trade execution. As a representative model-free RL algorithm, deep Q-network (DQN) has recently achieved great success on RL problems and even exceed the human performance through introducing deep neural networks. However, such classical deep neural network-based models cannot well handle the uncertainty in sequential decision-making and then limit their learning performance. In this paper, we propose a new model-free RL algorithm based on a Bayesian deep model. To be specific, deep kernel learning (i.e., a Gaussian process with deep kernel) is adopted to learn the hidden complex action-value function instead of classical deep learning models, which could encode more uncertainty and fully take advantage of the replay memory. The comparative experiments on standard RL testing platform, i.e., OpenAI-Gym, show that the proposed algorithm outweighs the DQN. Further investigations will be directed to applying RL for supporting dynamic decision-making in complex environments.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/130175