Dynamic choice of state abstraction in Q-learning

Tamassia, M; Zambetta, F; Raffe, WL; Mueller, FF; Li, X

Dynamic choice of state abstraction in Q-learning

Tamassia, M Zambetta, F Raffe, WL

Mueller, FF Li, X

Permalink

Publication Type:: Conference Proceeding
Citation:: Frontiers in Artificial Intelligence and Applications, 2016, 285 pp. 46 - 54
Issue Date:: 2016-01-01

Closed Access

	Filename	Description	Size
	FAIA285-0046.pdf	Published version	1.04 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Tamassia, M	en_US
dc.contributor.author	Zambetta, F	en_US
dc.contributor.author	Raffe, WL https://orcid.org/0000-0001-5310-0943	en_US
dc.contributor.author	Mueller, FF	en_US
dc.contributor.author	Li, X	en_US
dc.date.issued	2016-01-01	en_US
dc.identifier.citation	Frontiers in Artificial Intelligence and Applications, 2016, 285 pp. 46 - 54	en_US
dc.identifier.isbn	9781614996712	en_US
dc.identifier.issn	0922-6389	en_US
dc.identifier.uri	http://hdl.handle.net/10453/121842
dc.description.abstract	© 2016 The Authors and IOS Press. Q-learning associates states and actions of a Markov Decision Process to expected future reward through online learning. In practice, however, when the state space is large and experience is still limited, the algorithm will not find a match between current state and experience unless some details describing states are ignored. On the other hand, reducing state information affects long term performance because decisions will need to be made on less informative inputs. We propose a variation of Q-learning that gradually enriches state descriptions, after enough experience is accumulated. This is coupled with an ad-hoc exploration strategy that aims at collecting key information that allows the algorithm to enrich state descriptions earlier. Experimental results obtained by applying our algorithm to the arcade game Pac-Man show that our approach significantly outperforms Q-learning during the learning process while not penalizing long-term performance.	en_US
dc.relation.ispartof	Frontiers in Artificial Intelligence and Applications	en_US
dc.relation.isbasedon	10.3233/978-1-61499-672-9-46	en_US
dc.title	Dynamic choice of state abstraction in Q-learning	en_US
dc.type	Conference Proceeding
utslib.citation.volume	285	en_US
utslib.for	0806 Information Systems	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - PERSWADE - Centre on Persuasive Systems for Wise Adaptive Living
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	285	en_US

Abstract:

© 2016 The Authors and IOS Press. Q-learning associates states and actions of a Markov Decision Process to expected future reward through online learning. In practice, however, when the state space is large and experience is still limited, the algorithm will not find a match between current state and experience unless some details describing states are ignored. On the other hand, reducing state information affects long term performance because decisions will need to be made on less informative inputs. We propose a variation of Q-learning that gradually enriches state descriptions, after enough experience is accumulated. This is coupled with an ad-hoc exploration strategy that aims at collecting key information that allows the algorithm to enrich state descriptions earlier. Experimental results obtained by applying our algorithm to the arcade game Pac-Man show that our approach significantly outperforms Q-learning during the learning process while not penalizing long-term performance.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/121842