Big Data/Machine Learning/AI
Generalization strategies for improving machine learning predictions: a multicentric analysis of neonatal mortality in low and middle-income countries Gabriel Silva* Gabriel Silva Alexandre Dias Porto Chiavegatto Filho
Neonatal mortality remains a critical concern, particularly in low- and middle-income countries. We developed a comprehensive evaluation of machine learning (ML) algorithms’ performance in predicting neonatal mortality risk. Leveraging the dataset from the MNHR (Maternal and Neonatal Health Registry) – National Institutes of Health, contemplating a multicentric neonatal cohort of eight countries (Argentina, Bangladesh, Kenya, Democratic Republic of Congo, Pakistan, Guatemala, India, and Zambia), this study aimed to identify and compare effective training strategies to enhance the predictive performance of ML algorithms for multicentric neonatal data. We explored three distinct training frameworks: 1) a full algorithm for all participating countries, 2) country-specific algorithms trained locally, and 3) an approach using the largest country-specific training sample. We trained the algorithms using data from 2010 to 2016 and tested their predictive performance using data from 2017 to 2019. Five different ML algorithms (xgboost, lightgbm, catboost, adaboost and random forest) were trained with the five fundamental indicators recommended by the World Health Organization (WHO): maternal age, place of delivery, type of delivery, birth weight, and gestational age. The primary outcome was neonatal mortality, spanning from the first day of birth to the 42nd day after delivery. Our findings suggest that a generalized model, trained using collective data from all participating countries, demonstrated superior predictive performance, with an 0.811 AUC-ROC, 0.212 recall and 0.997 specificity. This study highlights the potential of using multi-country ML for improving neonatal health decisions in low- and middle-income countries, collaborating to pave the way for more globally-inclusive digital health strategies.