On designing socially acceptable reward shaping

Publication Type:
Conference Proceeding
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2016, 9979 LNAI pp. 860 - 869
Issue Date:
Filename Description Size
Conference paper.pdfPublished version505.13 kB
Adobe PDF
Full metadata record
© Springer International Publishing AG 2016. For social robots, learning from an ordinary user should be socially appealing. Unfortunately, machine learning demands an enormous amount of human data, and a prolonged interactive teaching session becomes anti-social. We have addressed this problem in the context of reward shaping for reinforcement learning. For efficient reward shaping, a continuous stream of rewards is expected from the teacher. We present a simple framework which seeks rewards for a small number of steps from each of a large number of human teachers. Therefore, it simplifies the job of an individual teacher. The framework was tested with online crowd workers on a transport puzzle. We thoroughly analyzed the quality of the learned policies and crowd’s teaching behavior. Our results showed that nearly perfect policies can be learned using this framework. The framework was generally acceptable in the crowd’s opinion.
Please use this identifier to cite or link to this item: