Disan: Directional self-attention network for RnN/CNN-free language understanding

Shen, T; Jiang, J; Zhou, T; Pan, S; Long, G; Zhang, C

Disan: Directional self-attention network for RnN/CNN-free language understanding

Shen, T Jiang, J

Zhou, T Pan, S

Long, G

Zhang, C

Permalink

Publication Type:: Conference Proceeding
Citation:: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, pp. 5446 - 5455
Issue Date:: 2018-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted Manuscript VersionAdobe PDF (1.61 MB)

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Shen, T	en_US
dc.contributor.author	Jiang, J https://orcid.org/0000-0001-5301-7779	en_US
dc.contributor.author	Zhou, T	en_US
dc.contributor.author	Pan, S https://orcid.org/0000-0003-0794-527X	en_US
dc.contributor.author	Long, G https://orcid.org/0000-0003-3740-9515	en_US
dc.contributor.author	Zhang, C https://orcid.org/0000-0001-5715-7154	en_US
dc.date.issued	2018-01-01	en_US
dc.identifier.citation	32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 2018, pp. 5446 - 5455	en_US
dc.identifier.isbn	9781577358008	en_US
dc.identifier.uri	http://hdl.handle.net/10453/129575
dc.description.abstract	Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.	en_US
dc.relation	http://purl.org/au-research/grants/arc/LP150100671
dc.relation	http://purl.org/au-research/grants/arc/LP160100630
dc.relation.ispartof	32nd AAAI Conference on Artificial Intelligence, AAAI 2018	en_US
dc.title	Disan: Directional self-attention network for RnN/CNN-free language understanding	en_US
dc.type	Conference Proceeding
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/DVC (International)
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - ACRI - Australia China Relations Institute
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Students
utslib.copyright.status	open_access
pubs.publication-status	Published	en_US

Abstract:

Copyright © 2018, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widely used on NLP tasks to capture the long-term and local dependencies, respectively. Attention mechanisms have recently attracted enormous interest due to their highly parallelizable computation, significantly less training time, and flexibility in modeling dependencies. We propose a novel attention mechanism in which the attention between elements from input sequence(s) is directional and multi-dimensional (i.e., feature-wise). A light-weight neural net, “Directional Self-Attention Network (DiSAN)”, is then proposed to learn sentence embedding, based solely on the proposed attention without any RNN/CNN structure. DiSAN is only composed of a directional self-attention with temporal order encoded, followed by a multi-dimensional attention that compresses the sequence into a vector representation. Despite its simple form, DiSAN outperforms complicated RNN models on both prediction quality and time efficiency. It achieves the best test accuracy among all sentence encoding methods and improves the most recent best result by 1.02% on the Stanford Natural Language Inference (SNLI) dataset, and shows state-of-the-art test accuracy on the Stanford Sentiment Treebank (SST), Multi-Genre natural language inference (MultiNLI), Sentences Involving Compositional Knowledge (SICK), Customer Review, MPQA, TREC question-type classification and Subjectivity (SUBJ) datasets.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/129575