Derivation and validation of a clinical predictive model for longer duration diarrhea among pediatric patients in Kenya using machine learning algorithms

Ogwel, Billy; Mzazi, Vincent H.; Awuor, Alex O.; Okonji, Caleb; Anyango, Raphael O.; Oreso, Caren; Ochieng, John B.; Munga, Stephen; Nasrin, Dilruba; Tickell, Kirkby D.; Pavlinac, Patricia B.; Kotloff, Karen L.; Omore, Richard

dc.contributor.author	Ogwel, Billy
dc.contributor.author	Mzazi, Vincent H.
dc.contributor.author	Awuor, Alex O.
dc.contributor.author	Okonji, Caleb
dc.contributor.author	Anyango, Raphael O.
dc.contributor.author	Oreso, Caren
dc.contributor.author	Ochieng, John B.
dc.contributor.author	Munga, Stephen
dc.contributor.author	Nasrin, Dilruba
dc.contributor.author	Tickell, Kirkby D.
dc.contributor.author	Pavlinac, Patricia B.
dc.contributor.author	Kotloff, Karen L.
dc.contributor.author	Omore, Richard
dc.date.accessioned	2025-02-01T04:41:21Z
dc.date.available	2025-02-01T04:41:21Z
dc.date.issued	2025-01-15
dc.identifier.citation	BMC Medical Informatics and Decision Making. 2025 Jan 15;25(1):28
dc.identifier.uri	https://doi.org/10.1186/s12911-025-02855-6
dc.identifier.uri	https://hdl.handle.net/10500/32073
dc.description.abstract	Abstract Background Despite the adverse health outcomes associated with longer duration diarrhea (LDD), there are currently no clinical decision tools for timely identification and better management of children with increased risk. This study utilizes machine learning (ML) to derive and validate a predictive model for LDD among children presenting with diarrhea to health facilities. Methods LDD was defined as a diarrhea episode lasting ≥ 7 days. We used 7 ML algorithms to build prognostic models for the prediction of LDD among children < 5 years using de-identified data from Vaccine Impact on Diarrhea in Africa study (N = 1,482) in model development and data from Enterics for Global Health Shigella study (N = 682) in temporal validation of the champion model. Features included demographic, medical history and clinical examination data collected at enrolment in both studies. We conducted split-sampling and employed K-fold cross-validation with over-sampling technique in the model development. Moreover, critical predictors of LDD and their impact on prediction were obtained using an explainable model agnostic approach. The champion model was determined based on the area under the curve (AUC) metric. Model calibrations were assessed using Brier, Spiegelhalter’s z-test and its accompanying p-value. Results There was a significant difference in prevalence of LDD between the development and temporal validation cohorts (478 [32.3%] vs 69 [10.1%]; p < 0.001). The following variables were associated with LDD in decreasing order: pre-enrolment diarrhea days (55.1%), modified Vesikari score(18.2%), age group (10.7%), vomit days (8.8%), respiratory rate (6.5%), vomiting (6.4%), vomit frequency (6.2%), rotavirus vaccination (6.1%), skin pinch (2.4%) and stool frequency (2.4%). While all models showed good prediction capability, the random forest model achieved the best performance (AUC [95% Confidence Interval]: 83.0 [78.6–87.5] and 71.0 [62.5–79.4]) on the development and temporal validation datasets, respectively. While the random forest model showed slight deviations from perfect calibration, these deviations were not statistically significant (Brier score = 0.17, Spiegelhalter p-value = 0.219). Conclusions Our study suggests ML derived algorithms could be used to rapidly identify children at increased risk of LDD. Integrating ML derived models into clinical decision-making may allow clinicians to target these children with closer observation and enhanced management.
dc.title	Derivation and validation of a clinical predictive model for longer duration diarrhea among pediatric patients in Kenya using machine learning algorithms
dc.type	Journal Article
dc.date.updated	2025-02-01T04:41:22Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s)