Statistical supervised meta-ensemble algorithm for medical record linkage.

Vo, K; Jonnagaddala, J; Liaw, S-T

Statistical supervised meta-ensemble algorithm for medical record linkage.

Vo, K

Jonnagaddala, J Liaw, S-T

Permalink

Publisher:: Elsevier BV
Publication Type:: Journal Article
Citation:: Journal of biomedical informatics, 2019, 95, pp. 103220
Issue Date:: 2019-07

Closed Access

	Filename	Description	Size
	1-s2.0-S1532046419301388-main.pdf	Published version	2.64 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Vo, K https://orcid.org/0000-0002-1796-3095
dc.contributor.author	Jonnagaddala, J
dc.contributor.author	Liaw, S-T
dc.date.accessioned	2020-06-06T02:10:45Z
dc.date.available	2019-05-28
dc.date.available	2020-06-06T02:10:45Z
dc.date.issued	2019-07
dc.identifier.citation	Journal of biomedical informatics, 2019, 95, pp. 103220
dc.identifier.issn	1532-0464
dc.identifier.issn	1532-0480
dc.identifier.uri	http://hdl.handle.net/10453/141173
dc.description.abstract	Identifying unique patients across multiple care facilities or services is a major challenge in providing continuous care and undertaking health research. Identifying and linking patients without compromising privacy and security is an emerging issue in the big data era. The large quantity and complexity of the patient data emphasize the need for effective linkage methods that are both scalable and accurate. In this study, we aim to develop and evaluate an ensemble classification method using the three most typically used supervised learning methods, namely support vector machines, logistic regression and standard feed-forward neural networks, to link records that belong to the same patient across multiple service locations. Our ensemble method is the combination of bagging and stacking. Each base learner's critical hyperparameters were selected through grid search technique. Two synthetic datasets were used in this study namely FEBRL and ePBRN. ePBRN linkage dataset was based on linkage errors noticed in the Australian primary care setting. The overall linkage performance was determined by assessing the blocking performance and classification performance. Our ensemble method outperformed the base learners in all evaluation metrics on one dataset. More specifically, the precision, which is average of individual precision scores in case of base learners increased from 90.70% to 94.85% in FEBRL, and from 62.17% to 99.28% in ePBRN. Similarly, the F-score increased from 94.92% to 98.18% in FEBRL, and from 72.99% to 91.72% in ePBRN. Our experiments suggest that we can significantly improve the linkage performance of individual algorithms by employing ensemble strategies.
dc.format	Print-Electronic
dc.language	eng
dc.publisher	Elsevier BV
dc.relation.ispartof	Journal of biomedical informatics
dc.relation.isbasedon	10.1016/j.jbi.2019.103220
dc.rights	info:eu-repo/semantics/closedAccess
dc.subject	06 Biological Sciences, 08 Information and Computing Sciences, 11 Medical and Health Sciences
dc.subject.classification	Medical Informatics
dc.subject.classification	Biomedical Engineering
dc.title	Statistical supervised meta-ensemble algorithm for medical record linkage.
dc.type	Journal Article
utslib.citation.volume	95
utslib.location.activity	United States
utslib.for	06 Biological Sciences
utslib.for	08 Information and Computing Sciences
utslib.for	11 Medical and Health Sciences
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Strength - CAI - Centre for Artificial Intelligence
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney
utslib.copyright.status	closed_access	*
dc.date.updated	2020-06-06T02:10:41Z
pubs.publication-status	Published
pubs.volume	95
utslib.start-page	103220

Abstract:

Identifying unique patients across multiple care facilities or services is a major challenge in providing continuous care and undertaking health research. Identifying and linking patients without compromising privacy and security is an emerging issue in the big data era. The large quantity and complexity of the patient data emphasize the need for effective linkage methods that are both scalable and accurate. In this study, we aim to develop and evaluate an ensemble classification method using the three most typically used supervised learning methods, namely support vector machines, logistic regression and standard feed-forward neural networks, to link records that belong to the same patient across multiple service locations. Our ensemble method is the combination of bagging and stacking. Each base learner's critical hyperparameters were selected through grid search technique. Two synthetic datasets were used in this study namely FEBRL and ePBRN. ePBRN linkage dataset was based on linkage errors noticed in the Australian primary care setting. The overall linkage performance was determined by assessing the blocking performance and classification performance. Our ensemble method outperformed the base learners in all evaluation metrics on one dataset. More specifically, the precision, which is average of individual precision scores in case of base learners increased from 90.70% to 94.85% in FEBRL, and from 62.17% to 99.28% in ePBRN. Similarly, the F-score increased from 94.92% to 98.18% in FEBRL, and from 72.99% to 91.72% in ePBRN. Our experiments suggest that we can significantly improve the linkage performance of individual algorithms by employing ensemble strategies.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/141173