Improving Machine Translation and Summarization with the Sinkhorn Divergence

Li, S; Unanue, IJ; Piccardi, M

Improving Machine Translation and Summarization with the Sinkhorn Divergence

Li, S Unanue, IJ Piccardi, M

Permalink

Publisher:: Springer Nature
Publication Type:: Conference Proceeding
Citation:: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 13938 LNCS, pp. 149-161
Issue Date:: 2023-01-01

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Accepted versionAdobe PDF (592.95 kB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Li, S
dc.contributor.author	Unanue, IJ
dc.contributor.author	Piccardi, M https://orcid.org/0000-0001-9250-6604
dc.date.accessioned	2023-11-10T01:50:42Z
dc.date.available	2023-11-10T01:50:42Z
dc.date.issued	2023-01-01
dc.identifier.citation	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2023, 13938 LNCS, pp. 149-161
dc.identifier.isbn	9783031333828
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.uri	http://hdl.handle.net/10453/173301
dc.description.abstract	Important natural language processing tasks such as machine translation and document summarization have made enormous strides in recent years. However, their performance is still partially limited by the standard training objectives, which operate on single tokens rather than on more global features. Moreover, such standard objectives do not explicitly consider the source documents, potentially affecting their alignment with the predictions. For these reasons, in this paper, we propose using an Optimal Transport (OT) training objective to promote a global alignment between the model’s predictions and the source documents. In addition, we present an original implementation of the OT objective based on the Sinkhorn divergence between the final hidden states of the model’s encoder and decoder. Experimental results over machine translation and abstractive summarization tasks show that the proposed approach has been able to achieve statistically significant improvements across all experimental settings compared to our baseline and other alternative objectives. A qualitative analysis of the results also shows that the predictions have been able to better align with the source sentences thanks to the supervision of the proposed objective.
dc.language	en
dc.publisher	Springer Nature
dc.relation.ispartof	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
dc.relation.ispartofseries	Lecture Notes in Computer Science
dc.relation.isbasedon	10.1007/978-3-031-33383-5_12
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://doi.org/10.1007/978-3-031-33383-5_12
dc.subject.classification	Artificial Intelligence & Image Processing
dc.subject.classification	46 Information and computing sciences
dc.title	Improving Machine Translation and Summarization with the Sinkhorn Divergence
dc.type	Conference Proceeding
utslib.citation.volume	13938 LNCS
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - GBDTC - Global Big Data Technologies
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Electrical and Data Engineering
utslib.copyright.status	open_access	*
dc.date.updated	2023-11-10T01:50:41Z
pubs.publication-status	Published
pubs.volume	13938 LNCS

Abstract:

Important natural language processing tasks such as machine translation and document summarization have made enormous strides in recent years. However, their performance is still partially limited by the standard training objectives, which operate on single tokens rather than on more global features. Moreover, such standard objectives do not explicitly consider the source documents, potentially affecting their alignment with the predictions. For these reasons, in this paper, we propose using an Optimal Transport (OT) training objective to promote a global alignment between the model’s predictions and the source documents. In addition, we present an original implementation of the OT objective based on the Sinkhorn divergence between the final hidden states of the model’s encoder and decoder. Experimental results over machine translation and abstractive summarization tasks show that the proposed approach has been able to achieve statistically significant improvements across all experimental settings compared to our baseline and other alternative objectives. A qualitative analysis of the results also shows that the predictions have been able to better align with the source sentences thanks to the supervision of the proposed objective.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/173301