Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling

Shen, T; Zhou, T; Long, G; Jiang, J; Wang, S; Zhang, C

Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling

Shen, T Zhou, T Long, G

Jiang, J

Wang, S Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: IJCAI International Joint Conference on Artificial Intelligence, 2018, 2018-July pp. 4345 - 4352
Issue Date:: 2018-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript versionAdobe PDF (560.59 kB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Shen, T	en_US
dc.contributor.author	Zhou, T	en_US
dc.contributor.author	Long, G https://orcid.org/0000-0003-3740-9515	en_US
dc.contributor.author	Jiang, J https://orcid.org/0000-0001-5301-7779	en_US
dc.contributor.author	Wang, S	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	IJCAI International Joint Conference on Artificial Intelligence, 2018, 2018-July pp. 4345 - 4352	en_US
dc.identifier.isbn	9780999241127	en_US
dc.identifier.issn	1045-0823	en_US
dc.identifier.uri	http://hdl.handle.net/10453/133763
dc.description.abstract	© 2018 International Joint Conferences on Artificial Intelligence.All right reserved. Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, “reinforced self-attention (ReSA)”, for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called “reinforced sequence sampling (RSS)”, selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, “reinforced self-attention network (ReSAN)”, solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.	en_US
dc.relation.ispartof	IJCAI International Joint Conference on Artificial Intelligence	en_US
dc.title	Reinforced self-attention network: A hybrid of hard and soft attention for sequence modeling	en_US
dc.type	Conference Proceeding
utslib.citation.volume	2018-July	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US
pubs.volume	2018-July	en_US

Abstract:

© 2018 International Joint Conferences on Artificial Intelligence.All right reserved. Many natural language processing tasks solely rely on sparse dependencies between a few tokens in a sentence. Soft attention mechanisms show promising performance in modeling local/global dependencies by soft probabilities between every two tokens, but they are not effective and efficient when applied to long sentences. By contrast, hard attention mechanisms directly select a subset of tokens but are difficult and inefficient to train due to their combinatorial nature. In this paper, we integrate both soft and hard attention into one context fusion model, “reinforced self-attention (ReSA)”, for the mutual benefit of each other. In ReSA, a hard attention trims a sequence for a soft self-attention to process, while the soft attention feeds reward signals back to facilitate the training of the hard one. For this purpose, we develop a novel hard attention called “reinforced sequence sampling (RSS)”, selecting tokens in parallel and trained via policy gradient. Using two RSS modules, ReSA efficiently extracts the sparse dependencies between each pair of selected tokens. We finally propose an RNN/CNN-free sentence-encoding model, “reinforced self-attention network (ReSAN)”, solely based on ReSA. It achieves state-of-the-art performance on both Stanford Natural Language Inference (SNLI) and Sentences Involving Compositional Knowledge (SICK) datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/133763