XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study.

Fan, Z; Song, W; Ke, Y; Jia, L; Li, S; Li, JJ; Zhang, Y; Lin, J; Wang, B

XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study.

Fan, Z Song, W Ke, Y Jia, L Li, S Li, JJ Zhang, Y Lin, J Wang, B

Permalink

Publisher:: Springer Nature
Publication Type:: Journal Article
Citation:: Arthritis Res Ther, 2024, 26, (1), pp. 213
Issue Date:: 2024-12-19

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Download Published versionAdobe PDF (1.76 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Fan, Z
dc.contributor.author	Song, W
dc.contributor.author	Ke, Y
dc.contributor.author	Jia, L
dc.contributor.author	Li, S
dc.contributor.author	Li, JJ
dc.contributor.author	Zhang, Y
dc.contributor.author	Lin, J
dc.contributor.author	Wang, B
dc.date.accessioned	2025-01-05T21:52:58Z
dc.date.available	2024-12-01
dc.date.available	2025-01-05T21:52:58Z
dc.date.issued	2024-12-19
dc.identifier.citation	Arthritis Res Ther, 2024, 26, (1), pp. 213
dc.identifier.issn	1478-6354
dc.identifier.issn	1478-6362
dc.identifier.uri	http://hdl.handle.net/10453/182965
dc.description.abstract	OBJECTIVE: To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. METHODS: In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. RESULTS: A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. CONCLUSIONS: Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.
dc.format	Electronic
dc.language	eng
dc.publisher	Springer Nature
dc.relation.ispartof	Arthritis Res Ther
dc.relation.isbasedon	10.1186/s13075-024-03450-2
dc.rights	info:eu-repo/semantics/openAccess
dc.subject	1103 Clinical Sciences, 1107 Immunology, 1117 Public Health and Health Services
dc.subject.classification	Arthritis & Rheumatology
dc.subject.classification	3202 Clinical sciences
dc.subject.classification	3204 Immunology
dc.subject.mesh	Humans
dc.subject.mesh	Osteoarthritis, Knee
dc.subject.mesh	Retrospective Studies
dc.subject.mesh	Female
dc.subject.mesh	Male
dc.subject.mesh	Middle Aged
dc.subject.mesh	Machine Learning
dc.subject.mesh	Aged
dc.subject.mesh	Cohort Studies
dc.subject.mesh	China
dc.subject.mesh	Adult
dc.subject.mesh	Surveys and Questionnaires
dc.subject.mesh	Humans
dc.subject.mesh	Osteoarthritis, Knee
dc.subject.mesh	Retrospective Studies
dc.subject.mesh	Cohort Studies
dc.subject.mesh	Adult
dc.subject.mesh	Aged
dc.subject.mesh	Middle Aged
dc.subject.mesh	China
dc.subject.mesh	Female
dc.subject.mesh	Male
dc.subject.mesh	Machine Learning
dc.subject.mesh	Surveys and Questionnaires
dc.subject.mesh	Humans
dc.subject.mesh	Osteoarthritis, Knee
dc.subject.mesh	Retrospective Studies
dc.subject.mesh	Female
dc.subject.mesh	Male
dc.subject.mesh	Middle Aged
dc.subject.mesh	Machine Learning
dc.subject.mesh	Aged
dc.subject.mesh	Cohort Studies
dc.subject.mesh	China
dc.subject.mesh	Adult
dc.subject.mesh	Surveys and Questionnaires
dc.title	XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study.
dc.type	Journal Article
utslib.citation.volume	26
utslib.location.activity	England
utslib.for	1103 Clinical Sciences
utslib.for	1107 Immunology
utslib.for	1117 Public Health and Health Services
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology
pubs.organisational-group	University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering
utslib.copyright.status	open_access	*
dc.rights.license	This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.date.updated	2025-01-05T21:52:55Z
pubs.issue	1
pubs.publication-status	Published online
pubs.volume	26
utslib.citation.issue	1

Abstract:

OBJECTIVE: To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. METHODS: In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. RESULTS: A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. CONCLUSIONS: Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/182965