Abstract:
Diarrhea is a public health problem globally, having an incidence of about 1.7 billion childhood diarrheal episodes every year, and an annual mortality of about 1.5 million. The clinical diarrheal outcomes surpass acute dehydration and electrolyte imbalance, and often include: longer duration diarrhea (LDD); chronic malnutrition; and mortality. This diarrheal burden disproportionately affects low- and middle-income countries, and is further exacerbated by delayed care-seeking, inadequate diagnostic capacity, demanding work environments, and provider burnout which can impair clinical judgment and performance. Predictive models can augment clinical decision-making by facilitating the rapid identification of patients at increased risk of poor diarrheal outcomes, facilitating timely and cost-effective interventions to improve prognoses. Existing literature revealed a paucity of research focused on the development of predictive models for diarrheal outcomes. This study aimed to bridge this research gap by: i.) identifying predictors of poor diarrhea outcomes for long duration diarrhea (LDD), chronic malnutrition, and mortality; ii.) deriving and validating patient-level predictive models for poor diarrheal outcomes; iii.) designing an R-shiny product suite for predictive models of poor diarrheal outcomes. A correlational study design that involved a hybrid feature selection strategy to identify predictors, was adopted. Seven machine learning algorithms were utilized for model development and evaluation leveraging data from three pediatric enteric studies conducted in Siaya County, Kenya between 2010 and 2023. Shapley values were estimated to enhance model interpretability and the model with optimal discrimination was selected as the champion model for each outcome. Clinical variables were the primary predictors of poor diarrheal outcomes, although the sets of predictors varied based on the distinct outcome being modeled. The champion models identified were: random forest for LDD
iii
(AUC [95% CI]: 83.0% [78.6–87.5]); gradient boosting for chronic malnutrition (AUC [95% CI]: 83.5% [81.6–85.4]); and random forest for mortality (AUC [95% CI]: 82.6% [77.1–88.1]). The model AUCs declined by 12% for LDD and 18% for chronic malnutrition during temporal validation. An R-Shiny web application was developed, featuring a consolidated interface that dynamically displays risk profiles and outcome-specific Shapley values upon submission of user inputs. Beyond demonstrating the practical utility of machine learning algorithms in rapid identification of high-risk children supporting clinical decision-making, resource prioritization, and improved management, this work contributes to the growing body of literature on the application of machine learning to predict pediatric risks. However, successful implementation and widespread adoption of the developed tool will require further research, collaboration, and ethical oversight. Consequently, future research is recommended to evaluate the clinical acceptability of these models, as well as their impact on clinical practice and patient outcomes. Moreover, it is essential to assess the broader implications of ML integration, including the operational challenges and the cost-effectiveness of model deployment through a before-after study or decision-analytic modeling framework.