Identifying people at risk of developing type 2 diabetes: A comparison of predictive analytics techniques and predictor variables

Publication Type:
Journal Article
International Journal of Medical Informatics, 2018, 119 pp. 22 - 38
Issue Date:
Filename Description Size
(ASCE)MT.1943-5533.0002129.pdfPublished Version1.23 MB
Adobe PDF
Full metadata record
© 2018 Elsevier B.V. Background: The present study aims to identify the patients at risk of type 2 diabetes (T2D). There is a body of literature that uses machine learning classification algorithms to predict development of T2D among patients. The current study compares the performance of these classification algorithms to identify patients who are at risk of developing T2D in short, medium and long terms. In addition, the list of predictor variables important for prediction for T2D progression is provided. Methods: This study uses 10,911 records generated in 36 clinics from the 15th of November 2008–15th of November 2016. Syntactic minority oversampling and random under sampling were used to create a balanced dataset. The performance of Neural Networks, Support Vector Machines, Decision Tress and Logistic Regression to identify patients developing T2D in short, medium and long terms was compared. The measures were Area Under Curve, Sensitivity, Specificity, Matthew correlation coefficient and Mean Calibration Error. Through importance analysis and information fusion techniques the predictors of developing T2D were identified for short, medium and long-term risk analysis. Results: The findings show that the performance of analytics techniques depends on both period and purpose of prediction whether the prediction is to identify people who will not develop T2D or to determine at risk patients. Oversampling as opposed to under sampling improved performance. 16 predictors and their importance to determine patients at risk of T2D in short, medium and long terms were identified. Conclusions: This study provides guidelines for an automated system to prompt patients for screening. Several predictors are reportable by patients, others can be examined by physicians or ordered for further lab examination, which offers a potential reduction of the burden placed upon the clinical settings.
Please use this identifier to cite or link to this item: