Reward from demonstration in interactive reinforcement learning

Raza, SA; Johnston, B; Williams, MA

Reward from demonstration in interactive reinforcement learning

Raza, SA

Johnston, B Williams, MA

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, 2016, pp. 414 - 417
Issue Date:: 2016-01-01

Closed Access

	Filename	Description	Size
	FLAIRS-29.pdf	Published version	319.74 kB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Raza, SA https://orcid.org/0000-0001-6570-4808	en_US
dc.contributor.author	Johnston, B	en_US
dc.contributor.author	Williams, MA https://orcid.org/0000-0002-1047-0503	en_US
dc.date.issued	2016-01-01	en_US
dc.identifier.citation	Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, 2016, pp. 414 - 417	en_US
dc.identifier.isbn	9781577357568	en_US
dc.identifier.uri	http://hdl.handle.net/10453/92736
dc.description.abstract	Copyright © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All right reserved. In reinforcement learning (RL), reward shaping is used to show the desirable behavior by assigning positive or negative reward for learner's preceding action. However, for reward shaping through human-generated rewards, an important aspect is to make it approachable to humans. Typically, a human teacher's role requires being watchful of agent's action to assign judgmental feedback based on prior knowledge. It can be a mentally tough and unpleasant exercise especially for lengthy teaching sessions. We present a method, Shaping from Interactive Demonstrations (SfID), which instead of judgmental reward takes action label from human. Therefore, it simplifies the teacher's role to demonstrating the action to select from a state. We compare SfID with a standard reward shaping approach on Sokoban domain. The results show the competitiveness of SfID with the standard reward shaping.	en_US
dc.relation	http://purl.org/au-research/grants/arc/DP160102693
dc.relation.ispartof	Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016	en_US
dc.title	Reward from demonstration in interactive reinforcement learning	en_US
dc.type	Conference Proceeding
utslib.for	0806 Information Systems	en_US
utslib.for	080101 Adaptive Agents and Intelligent Robotics	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US

Abstract:

Copyright © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All right reserved. In reinforcement learning (RL), reward shaping is used to show the desirable behavior by assigning positive or negative reward for learner's preceding action. However, for reward shaping through human-generated rewards, an important aspect is to make it approachable to humans. Typically, a human teacher's role requires being watchful of agent's action to assign judgmental feedback based on prior knowledge. It can be a mentally tough and unpleasant exercise especially for lengthy teaching sessions. We present a method, Shaping from Interactive Demonstrations (SfID), which instead of judgmental reward takes action label from human. Therefore, it simplifies the teacher's role to demonstrating the action to select from a state. We compare SfID with a standard reward shaping approach on Sokoban domain. The results show the competitiveness of SfID with the standard reward shaping.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/92736