Explainable machine learning model for predicting cesarean section following induction of labor: Development and external validation using real-world data.

Hu, Y; Zhang, X; Slavin, V; Enticott, J; Callander, E

Explainable machine learning model for predicting cesarean section following induction of labor: Development and external validation using real-world data.

Hu, Y

Zhang, X Slavin, V Enticott, J Callander, E

Permalink

Publisher:: Public Library of Science (PLoS)
Publication Type:: Journal Article
Citation:: PLOS Digit Health, 2025, 4, (11), pp. e0001061
Issue Date:: 2025-11

Open Access

Copyright Clearance Process

Recently Added
In Progress
Open Access

This item is open access.

Adobe PDF

Download Published versionAdobe PDF (1.19 MB)

View on publisher's site

View statistics

Full metadata record

Field	Value	Language
dc.contributor.author	Hu, Y https://orcid.org/0000-0003-1794-7789
dc.contributor.author	Zhang, X
dc.contributor.author	Slavin, V
dc.contributor.author	Enticott, J
dc.contributor.author	Callander, E
dc.contributor.editor	Leal-Neto,, OB
dc.date.accessioned	2026-02-03T09:03:32Z
dc.date.available	2025-10-06
dc.date.available	2026-02-03T09:03:32Z
dc.date.issued	2025-11
dc.identifier.citation	PLOS Digit Health, 2025, 4, (11), pp. e0001061
dc.identifier.issn	2767-3170
dc.identifier.issn	2767-3170
dc.identifier.uri	http://hdl.handle.net/10453/192864
dc.description.abstract	Induction of labor (IOL) is a common yet complex clinical procedure associated with varying risks, including cesarean section (CS). Accurate prediction models may help support more informed, personalized decision-making. This study aimed to develop and validate an explainable machine learning prediction model for CS following IOL. We used population-based administrative perinatal datasets from two Australian states (New South Wales (NSW) and Queensland) covering all births between 2016 and 2019 for model development. Temporal validation was conducted using 2020 births from NSW, and geographical validation using 2016-2018 births from Victoria. We included women with singleton, cephalic, term, live births who attempted IOL and had no prior CS. Seven models (logistic regression, random forest, gradient boosting, LightGBM, XGBoost, CatBoost, and AdaBoost) were developed with hyperparameter tuning and feature selection. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, calibration plot (overall and across sociodemographic subgroups), decision curve analysis, Brier Score, and model parsimony. SHAP (SHapley Additive exPlanations) values were used to explain predictor contributions. A total of 180,700 women were included in model development (mean age 31 ± 5 years; CS = 20.8%). The optimal model, developed using XGBoost with ten predictors, achieved AUROCs of 0.76 (95% CI: 0.75-0.77) and 0.75 (95% CI: 0.74-0.76) in temporal (n = 14,527; CS = 22.5%) and geographical (n = 14,755; CS = 19.0%) validations, respectively. The most influential predictors were nulliparity, pre-pregnancy body mass index, and maternal age, while diabetes and hypertension (pre-existing or pregnancy-related) contributed least. Women with higher predicted CS probabilities had increased inpatient costs and maternal morbidity, regardless of actual mode of birth. The final model is accessible via an interactive web application (https://csai-8ccf2690242c.herokuapp.com/). This model demonstrates strong predictive performance using routinely collected maternal factors. Further co-design and implementation research is needed before potential clinical adoption.
dc.format	Electronic-eCollection
dc.language	eng
dc.publisher	Public Library of Science (PLoS)
dc.relation.ispartof	PLOS Digit Health
dc.relation.isbasedon	10.1371/journal.pdig.0001061
dc.rights	info:eu-repo/semantics/openAccess
dc.title	Explainable machine learning model for predicting cesarean section following induction of labor: Development and external validation using real-world data.
dc.type	Journal Article
utslib.citation.volume	4
utslib.location.activity	United States
pubs.organisational-group	University of Technology Sydney
pubs.organisational-group	University of Technology Sydney/Faculty of Health
utslib.copyright.status	open_access	*
dc.rights.license	This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). To view a copy of this license, visit https://creativecommons.org/licenses/by/4.0/
dc.date.updated	2026-02-03T09:03:30Z
pubs.issue	11
pubs.publication-status	Published online
pubs.volume	4
utslib.citation.issue	11

Abstract:

Induction of labor (IOL) is a common yet complex clinical procedure associated with varying risks, including cesarean section (CS). Accurate prediction models may help support more informed, personalized decision-making. This study aimed to develop and validate an explainable machine learning prediction model for CS following IOL. We used population-based administrative perinatal datasets from two Australian states (New South Wales (NSW) and Queensland) covering all births between 2016 and 2019 for model development. Temporal validation was conducted using 2020 births from NSW, and geographical validation using 2016-2018 births from Victoria. We included women with singleton, cephalic, term, live births who attempted IOL and had no prior CS. Seven models (logistic regression, random forest, gradient boosting, LightGBM, XGBoost, CatBoost, and AdaBoost) were developed with hyperparameter tuning and feature selection. Performance was assessed using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve, calibration plot (overall and across sociodemographic subgroups), decision curve analysis, Brier Score, and model parsimony. SHAP (SHapley Additive exPlanations) values were used to explain predictor contributions. A total of 180,700 women were included in model development (mean age 31 ± 5 years; CS = 20.8%). The optimal model, developed using XGBoost with ten predictors, achieved AUROCs of 0.76 (95% CI: 0.75-0.77) and 0.75 (95% CI: 0.74-0.76) in temporal (n = 14,527; CS = 22.5%) and geographical (n = 14,755; CS = 19.0%) validations, respectively. The most influential predictors were nulliparity, pre-pregnancy body mass index, and maternal age, while diabetes and hypertension (pre-existing or pregnancy-related) contributed least. Women with higher predicted CS probabilities had increased inpatient costs and maternal morbidity, regardless of actual mode of birth. The final model is accessible via an interactive web application (https://csai-8ccf2690242c.herokuapp.com/). This model demonstrates strong predictive performance using routinely collected maternal factors. Further co-design and implementation research is needed before potential clinical adoption.

Please use this identifier to cite or link to this item:

http://hdl.handle.net/10453/192864