Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables

Talaei-Khoei, A; Wilson, JM

Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables

Talaei-Khoei, A Wilson, JM

Permalink

Publication Type:: Journal Article
Citation:: International Journal of Medical Informatics, 2018, 119 pp. 22 - 38
Issue Date:: 2018-11-01

Closed Access

	Filename	Description	Size
	(ASCE)MT.1943-5533.0002129.pdf	Published Version	1.23 MB	Adobe PDF	View/Open

Copyright Clearance Process

Recently Added
In Progress
Closed Access

This item is closed access and not available.

Full metadata record

Field	Value	Language
dc.contributor.author	Talaei-Khoei, A	en_US
dc.contributor.author	Wilson, JM	en_US
dc.date.available	2018-08-16	en_US
dc.date.issued	2018-11-01	en_US
dc.identifier.citation	International Journal of Medical Informatics, 2018, 119 pp. 22 - 38	en_US
dc.identifier.issn	1386-5056	en_US
dc.identifier.uri	http://hdl.handle.net/10453/132041
dc.description.abstract	© 2018 Elsevier B.V. Background: The present study aims to identify the patients at risk of type 2 diabetes (T2D). There is a body of literature that uses machine learning classification algorithms to predict development of T2D among patients. The current study compares the performance of these classification algorithms to identify patients who are at risk of developing T2D in short, medium and long terms. In addition, the list of predictor variables important for prediction for T2D progression is provided. Methods: This study uses 10,911 records generated in 36 clinics from the 15th of November 2008–15th of November 2016. Syntactic minority oversampling and random under sampling were used to create a balanced dataset. The performance of Neural Networks, Support Vector Machines, Decision Tress and Logistic Regression to identify patients developing T2D in short, medium and long terms was compared. The measures were Area Under Curve, Sensitivity, Specificity, Matthew correlation coefficient and Mean Calibration Error. Through importance analysis and information fusion techniques the predictors of developing T2D were identified for short, medium and long-term risk analysis. Results: The findings show that the performance of analytics techniques depends on both period and purpose of prediction whether the prediction is to identify people who will not develop T2D or to determine at risk patients. Oversampling as opposed to under sampling improved performance. 16 predictors and their importance to determine patients at risk of T2D in short, medium and long terms were identified. Conclusions: This study provides guidelines for an automated system to prompt patients for screening. Several predictors are reportable by patients, others can be examined by physicians or ordered for further lab examination, which offers a potential reduction of the burden placed upon the clinical settings.	en_US
dc.relation.ispartof	International Journal of Medical Informatics	en_US
dc.relation.isbasedon	10.1016/j.ijmedinf.2018.08.008	en_US
dc.subject.classification	Medical Informatics	en_US
dc.subject.mesh	Humans	en_US
dc.subject.mesh	Diabetes Mellitus, Type 2	en_US
dc.subject.mesh	Disease Progression	en_US
dc.subject.mesh	Mass Screening	en_US
dc.subject.mesh	Logistic Models	en_US
dc.subject.mesh	Risk Factors	en_US
dc.subject.mesh	Predictive Value of Tests	en_US
dc.subject.mesh	Algorithms	en_US
dc.subject.mesh	Neural Networks (Computer)	en_US
dc.subject.mesh	Middle Aged	en_US
dc.subject.mesh	Female	en_US
dc.subject.mesh	Male	en_US
dc.subject.mesh	Machine Learning	en_US
dc.subject.mesh	Support Vector Machine	en_US
dc.subject.mesh	Neural Networks, Computer	en_US
dc.title	Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables	en_US
dc.type	Journal Article
utslib.citation.volume	119	en_US
utslib.for	080301 Bioinformatics Software	en_US
utslib.for	0801 Artificial Intelligence and Image Processing	en_US
utslib.for	08 Information and Computing Sciences	en_US
utslib.for	09 Engineering	en_US
utslib.for	11 Medical and Health Sciences	en_US
pubs.embargo.period	Not known	en_US
pubs.organisational-group	/University of Technology Sydney
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	/University of Technology Sydney/Faculty of Engineering and Information Technology/School of Computer Science
pubs.organisational-group	/University of Technology Sydney/Strength - HCTD - Human Centred Technology Design
utslib.copyright.status	closed_access
pubs.publication-status	Published	en_US
pubs.volume	119	en_US

Abstract:

© 2018 Elsevier B.V. Background: The present study aims to identify the patients at risk of type 2 diabetes (T2D). There is a body of literature that uses machine learning classification algorithms to predict development of T2D among patients. The current study compares the performance of these classification algorithms to identify patients who are at risk of developing T2D in short, medium and long terms. In addition, the list of predictor variables important for prediction for T2D progression is provided. Methods: This study uses 10,911 records generated in 36 clinics from the 15th of November 2008–15th of November 2016. Syntactic minority oversampling and random under sampling were used to create a balanced dataset. The performance of Neural Networks, Support Vector Machines, Decision Tress and Logistic Regression to identify patients developing T2D in short, medium and long terms was compared. The measures were Area Under Curve, Sensitivity, Specificity, Matthew correlation coefficient and Mean Calibration Error. Through importance analysis and information fusion techniques the predictors of developing T2D were identified for short, medium and long-term risk analysis. Results: The findings show that the performance of analytics techniques depends on both period and purpose of prediction whether the prediction is to identify people who will not develop T2D or to determine at risk patients. Oversampling as opposed to under sampling improved performance. 16 predictors and their importance to determine patients at risk of T2D in short, medium and long terms were identified. Conclusions: This study provides guidelines for an automated system to prompt patients for screening. Several predictors are reportable by patients, others can be examined by physicians or ordered for further lab examination, which offers a potential reduction of the burden placed upon the clinical settings.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/132041