XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study.
- Publisher:
- Springer Nature
- Publication Type:
- Journal Article
- Citation:
- Arthritis Res Ther, 2024, 26, (1), pp. 213
- Issue Date:
- 2024-12-19
Open Access
Copyright Clearance Process
- Recently Added
- In Progress
- Open Access
This item is open access.
Full metadata record
Field | Value | Language |
---|---|---|
dc.contributor.author | Fan, Z | |
dc.contributor.author | Song, W | |
dc.contributor.author | Ke, Y | |
dc.contributor.author | Jia, L | |
dc.contributor.author | Li, S | |
dc.contributor.author | Li, JJ | |
dc.contributor.author | Zhang, Y | |
dc.contributor.author | Lin, J | |
dc.contributor.author | Wang, B | |
dc.date.accessioned | 2025-01-05T21:52:58Z | |
dc.date.available | 2024-12-01 | |
dc.date.available | 2025-01-05T21:52:58Z | |
dc.date.issued | 2024-12-19 | |
dc.identifier.citation | Arthritis Res Ther, 2024, 26, (1), pp. 213 | |
dc.identifier.issn | 1478-6354 | |
dc.identifier.issn | 1478-6362 | |
dc.identifier.uri | http://hdl.handle.net/10453/182965 | |
dc.description.abstract | OBJECTIVE: To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. METHODS: In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. RESULTS: A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. CONCLUSIONS: Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach. | |
dc.format | Electronic | |
dc.language | eng | |
dc.publisher | Springer Nature | |
dc.relation.ispartof | Arthritis Res Ther | |
dc.relation.isbasedon | 10.1186/s13075-024-03450-2 | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.subject | 1103 Clinical Sciences, 1107 Immunology, 1117 Public Health and Health Services | |
dc.subject.classification | Arthritis & Rheumatology | |
dc.subject.classification | 3202 Clinical sciences | |
dc.subject.classification | 3204 Immunology | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Osteoarthritis, Knee | |
dc.subject.mesh | Retrospective Studies | |
dc.subject.mesh | Female | |
dc.subject.mesh | Male | |
dc.subject.mesh | Middle Aged | |
dc.subject.mesh | Machine Learning | |
dc.subject.mesh | Aged | |
dc.subject.mesh | Cohort Studies | |
dc.subject.mesh | China | |
dc.subject.mesh | Adult | |
dc.subject.mesh | Surveys and Questionnaires | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Osteoarthritis, Knee | |
dc.subject.mesh | Retrospective Studies | |
dc.subject.mesh | Cohort Studies | |
dc.subject.mesh | Adult | |
dc.subject.mesh | Aged | |
dc.subject.mesh | Middle Aged | |
dc.subject.mesh | China | |
dc.subject.mesh | Female | |
dc.subject.mesh | Male | |
dc.subject.mesh | Machine Learning | |
dc.subject.mesh | Surveys and Questionnaires | |
dc.subject.mesh | Humans | |
dc.subject.mesh | Osteoarthritis, Knee | |
dc.subject.mesh | Retrospective Studies | |
dc.subject.mesh | Female | |
dc.subject.mesh | Male | |
dc.subject.mesh | Middle Aged | |
dc.subject.mesh | Machine Learning | |
dc.subject.mesh | Aged | |
dc.subject.mesh | Cohort Studies | |
dc.subject.mesh | China | |
dc.subject.mesh | Adult | |
dc.subject.mesh | Surveys and Questionnaires | |
dc.title | XGBoost-SHAP-based interpretable diagnostic framework for knee osteoarthritis: a population-based retrospective cohort study. | |
dc.type | Journal Article | |
utslib.citation.volume | 26 | |
utslib.location.activity | England | |
utslib.for | 1103 Clinical Sciences | |
utslib.for | 1107 Immunology | |
utslib.for | 1117 Public Health and Health Services | |
pubs.organisational-group | University of Technology Sydney | |
pubs.organisational-group | University of Technology Sydney/Faculty of Engineering and Information Technology | |
pubs.organisational-group | University of Technology Sydney/Faculty of Engineering and Information Technology/School of Biomedical Engineering | |
utslib.copyright.status | open_access | * |
dc.rights.license | This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.date.updated | 2025-01-05T21:52:55Z | |
pubs.issue | 1 | |
pubs.publication-status | Published online | |
pubs.volume | 26 | |
utslib.citation.issue | 1 |
Abstract:
OBJECTIVE: To use routine demographic and clinical data to develop an interpretable individual-level machine learning (ML) model to diagnose knee osteoarthritis (KOA) and to identify highly ranked features. METHODS: In this retrospective, population-based cohort study, anonymized questionnaire data was retrieved from the Wu Chuan KOA Study, Inner Mongolia, China. After feature selections, participants were divided in a 7:3 ratio into training and test sets. Class balancing was applied to the training set for data augmentation. Four ML classifiers were compared by cross-validation within the training set and their performance was further analyzed with an unseen test set. Classifications were evaluated using sensitivity, specificity, positive predictive value, negative predictive value, accuracy, area under the curve(AUC), G-means, and F1 scores. The best model was explained using Shapley values to extract highly ranked features. RESULTS: A total of 1188 participants were investigated in this study, among whom 26.3% were diagnosed with KOA. Comparatively, XGBoost with Boruta exhibited the highest classification performance among the four models, with an AUC of 0.758, G-means of 0.800, and F1 scores of 0.703. The SHAP method reveals the top 17 features of KOA according to the importance ranking, and the average of the experience of joint pain was recognized as the most important features. CONCLUSIONS: Our study highlights the usefulness of machine learning in unveiling important factors that influence the diagnosis of KOA to guide new prevention strategies. Further work is needed to validate this approach.
Please use this identifier to cite or link to this item:
Download statistics for the last 12 months
Not enough data to produce graph