On Designing Socially Acceptable Reward Shaping

Publisher:
Springer
Publication Type:
Conference Proceeding
Citation:
Social Robotics, 2016, pp. 860 - 869
Issue Date:
2016
Full metadata record
Files in This Item:
Filename Description Size
Conference paper.pdfPublished version505.13 kB
Adobe PDF
For social robots, learning from an ordinary user should be socially appealing. Unfortunately, machine learning demands an enormous amount of human data, and a prolonged interactive teaching session becomes anti-social. We have addressed this problem in the context of reward shaping for reinforcement learning. For efficient reward shaping, a continuous stream of rewards is expected from the teacher. We present a simple framework which seeks rewards for a small number of steps from each of a large number of human teachers. Therefore, it simplifies the job of an individual teacher. The framework was tested with online crowd workers on a transport puzzle. We thoroughly analyzed the quality of the learned policies and crowd’s teaching behavior. Our results showed that nearly perfect policies can be learned using this framework. The framework was generally acceptable in the crowd’s opinion.
Please use this identifier to cite or link to this item: