Reward from demonstration in interactive reinforcement learning
- Publication Type:
- Conference Proceeding
- Proceedings of the 29th International Florida Artificial Intelligence Research Society Conference, FLAIRS 2016, 2016, pp. 414 - 417
- Issue Date:
Copyright © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All right reserved. In reinforcement learning (RL), reward shaping is used to show the desirable behavior by assigning positive or negative reward for learner's preceding action. However, for reward shaping through human-generated rewards, an important aspect is to make it approachable to humans. Typically, a human teacher's role requires being watchful of agent's action to assign judgmental feedback based on prior knowledge. It can be a mentally tough and unpleasant exercise especially for lengthy teaching sessions. We present a method, Shaping from Interactive Demonstrations (SfID), which instead of judgmental reward takes action label from human. Therefore, it simplifies the teacher's role to demonstrating the action to select from a state. We compare SfID with a standard reward shaping approach on Sokoban domain. The results show the competitiveness of SfID with the standard reward shaping.
Please use this identifier to cite or link to this item: