Perceiving the World: Question-guided Reinforcement Learning for Text-based Games

Xu, Y; Fang, M; Chen, L; Du, Y; Zhou, JT; Zhang, C

Perceiving the World: Question-guided Reinforcement Learning for Text-based Games

Xu, Y Fang, M Chen, L

Du, Y Zhou, JT Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2022, 1, pp. 538-560
Issue Date:: 2022-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (11.98 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Xu, Y
dc.contributor.author	Fang, M
dc.contributor.author	Chen, L https://orcid.org/0000-0002-6468-5729
dc.contributor.author	Du, Y
dc.contributor.author	Zhou, JT
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154
dc.date.accessioned	2023-05-16T00:39:47Z
dc.date.available	2023-05-16T00:39:47Z
dc.date.issued	2022-01-01
dc.identifier.citation	Proceedings of the Annual Meeting of the Association for Computational Linguistics, 2022, 1, pp. 538-560
dc.identifier.isbn	9781955917216
dc.identifier.issn	0736-587X
dc.identifier.uri	http://hdl.handle.net/10453/170348
dc.description.abstract	Text-based games provide an interactive way to study natural language processing. While deep reinforcement learning has shown effectiveness in developing the game playing agent, the low sample efficiency and the large action space remain to be the two major challenges that hinder the DRL from being applied in the real world. In this paper, we address the challenges by introducing world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment. We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency. The experimental results show that the proposed method significantly improves the performance and sample efficiency. Besides, it shows robustness against compound error and limited pre-training data.
dc.language	en
dc.relation.ispartof	Proceedings of the Annual Meeting of the Association for Computational Linguistics
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Perceiving the World: Question-guided Reinforcement Learning for Text-based Games
dc.type	Conference Proceeding
utslib.citation.volume	1
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
utslib.copyright.status	open_access	*
dc.date.updated	2023-05-16T00:39:44Z
pubs.publication-status	Published
pubs.volume	1

Abstract:

Text-based games provide an interactive way to study natural language processing. While deep reinforcement learning has shown effectiveness in developing the game playing agent, the low sample efficiency and the large action space remain to be the two major challenges that hinder the DRL from being applied in the real world. In this paper, we address the challenges by introducing world-perceiving modules, which automatically decompose tasks and prune actions by answering questions about the environment. We then propose a two-phase training framework to decouple language learning from reinforcement learning, which further improves the sample efficiency. The experimental results show that the proposed method significantly improves the performance and sample efficiency. Besides, it shows robustness against compound error and limited pre-training data.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/170348