Reward from Demonstration in Interactive Reinforcement Learning

Publisher:
AAAI
Publication Type:
Conference Proceeding
Citation:
Proceedings of the Twenty-Ninth International Florida Artificial Intelligence Research Society Conference, 2016, pp. 414 - 417
Issue Date:
2016-05-02
Full metadata record
Files in This Item:
Filename Description Size
FLAIRS-29.pdfPublished version319.74 kB
Adobe PDF
In reinforcement learning (RL), reward shaping is used to show the desirable behavior by assigning positive or negative reward for learner’s preceding action. However, for reward shaping through human-generated rewards, an important aspect is to make it approachable to humans. Typically, a human teacher’s role requires being watchful of agent’s action to assign judgmental feedback based on prior knowledge. It can be a mentally tough and unpleasant exercise especially for lengthy teaching sessions. We present a method, Shaping from Interactive Demonstrations (SfID), which instead of judgmental reward takes action label from human. Therefore, it simplifies the teacher’s role to demonstrating the action to select from a state. We compare SfID with a standard reward shaping approach on Sokoban domain. The results show the competitiveness of SfID with the standard reward shaping.
Please use this identifier to cite or link to this item: