Medical Question Summarization with Entity-driven Contrastive Learning

Lu, W; Wei, S; Peng, X; Wang, Y-F; Naseem, U; Wang, S

Medical Question Summarization with Entity-driven Contrastive Learning

Lu, W Wei, S Peng, X

Wang, Y-F Naseem, U Wang, S

Permalink

Publisher:: Association for Computing Machinery (ACM)
Publication Type:: Journal Article
Citation:: ACM Transactions on Asian and Low-Resource Language Information Processing

Closed Access

	Filename	Description	Size
	77. Medical QA. ACM Trans.pdf	Accepted version	768.2 kB		View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Lu, W
dc.contributor.author	Wei, S
dc.contributor.author	Peng, X https://orcid.org/0000-0002-8901-1472
dc.contributor.author	Wang, Y-F
dc.contributor.author	Naseem, U
dc.contributor.author	Wang, S https://orcid.org/0000-0003-1133-9379
dc.date.accessioned	2024-03-27T00:45:53Z
dc.date.available	2024-03-27T00:45:53Z
dc.identifier.citation	ACM Transactions on Asian and Low-Resource Language Information Processing
dc.identifier.issn	2375-4699
dc.identifier.issn	2375-4702
dc.identifier.uri	http://hdl.handle.net/10453/177206
dc.description.abstract	<jats:p> By summarizing longer consumer health questions into shorter and essential ones, medical question-answering systems can more accurately understand consumer intentions and retrieve suitable answers. However, medical question summarization is very challenging due to obvious distinctions in health trouble descriptions from patients and doctors. Although deep learning has been applied to successfully address the medical question summarization (MQS) task, two challenges remain: how to correctly capture question focus to model its semantic intention, and how to obtain reliable datasets to fairly evaluate performance. To address these challenges, this paper proposes a novel medical question summarization framework based on <jats:underline>e</jats:underline> ntity-driven <jats:underline>c</jats:underline> ontrastive <jats:underline>l</jats:underline> earning (ECL). ECL employs medical entities present in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples. This approach compels models to focus on essential information and consequently generate more accurate question summaries. Furthermore, we have discovered that some MQS datasets, such as the iCliniq dataset with a 33% duplicate rate, have significant data leakage issues. To ensure an impartial evaluation of the related methods, this paper carefully examines leaked samples to reorganize more reasonable datasets. Extensive experiments demonstrate that our ECL method outperforms the existing methods and achieves new state-of-the-art performance, i.e., 52.85, 43.16, 41.31, 43.52 in terms of ROUGE-1 metric on MeQSum, CHQ-Summ, iCliniq, HealthCareMagic dataset, respectively. The code and datasets are available at https://github.com/yrbobo/MQS-ECL. </jats:p>
dc.language	en
dc.publisher	Association for Computing Machinery (ACM)
dc.relation.ispartof	ACM Transactions on Asian and Low-Resource Language Information Processing
dc.relation.isbasedon	10.1145/3652160
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject.classification	46 Information and computing sciences
dc.subject.classification	47 Language, communication and culture
dc.title	Medical Question Summarization with Entity-driven Contrastive Learning
dc.type	Journal Article
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Strength - AAII - Australian Artificial Intelligence Institute
utslib.copyright.status	closed_access	*
dc.date.updated	2024-03-27T00:45:52Z
pubs.publication-status	Published online

Abstract:

By summarizing longer consumer health questions into shorter and essential ones, medical question-answering systems can more accurately understand consumer intentions and retrieve suitable answers. However, medical question summarization is very challenging due to obvious distinctions in health trouble descriptions from patients and doctors. Although deep learning has been applied to successfully address the medical question summarization (MQS) task, two challenges remain: how to correctly capture question focus to model its semantic intention, and how to obtain reliable datasets to fairly evaluate performance. To address these challenges, this paper proposes a novel medical question summarization framework based on e ntity-driven c ontrastive l earning (ECL). ECL employs medical entities present in frequently asked questions (FAQs) as focuses and devises an effective mechanism to generate hard negative samples. This approach compels models to focus on essential information and consequently generate more accurate question summaries. Furthermore, we have discovered that some MQS datasets, such as the iCliniq dataset with a 33% duplicate rate, have significant data leakage issues. To ensure an impartial evaluation of the related methods, this paper carefully examines leaked samples to reorganize more reasonable datasets. Extensive experiments demonstrate that our ECL method outperforms the existing methods and achieves new state-of-the-art performance, i.e., 52.85, 43.16, 41.31, 43.52 in terms of ROUGE-1 metric on MeQSum, CHQ-Summ, iCliniq, HealthCareMagic dataset, respectively. The code and datasets are available at https://github.com/yrbobo/MQS-ECL.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/177206