Big Data/Machine Learning/AI
Machine Learning for Predicting Neonatal Prematurity in Seven Low- and Middle-Income Countries: Analysis of the Maternal Newborn Health Registry (MNHR) Pregnancy Outcomes Marianna Gerardo Hidalgo Santos Jorge Leite* Marianna Gerardo Hidalgo Santos Jorge Leite Gabriel Ferreira dos Santos Silva Fabiano Novaes Barcelos Carine Savalli Alexandre Chiavegatto Filho
Introduction: Preterm birth rates in developing nations contribute significantly to newborn mortality, largely attributed to preventable health conditions. Addressing this issue requires early intervention through health and public policies targeting high-risk pregnancies. Artificial Intelligence, specifically machine learning, has gained traction for predictive modeling in healthcare. The efficacy and generalizability of these algorithms hinge on leveraging routinely-collected data to identify pregnancies at risk for preterm birth.
Methods: We analyzed Global Network Maternal Newborn Health Registry (MNHR) data from 2017 to 2019 across seven low- and middle-income countries, encompassing 138,303 pregnancies and reporting term or preterm outcomes. Initially selecting 47 variables available at 20 weeks, we employed Boruta algorithm-based feature selection, identifying 19 variables for model training. Target encoding and z-score normalization were applied to qualitative and quantitative variables, respectively. Popular machine learning algorithms—XGBoost, Catboost, LightGBM, and Random Forest—were tested on a 30% test set.
Results: CatBoostClassifier yielded an Area Under the Curve (AUC)of 0.7599 (Sensitivity: 0.9882, Specificity: 0.1390). Site-specific analysis revealed superior performance of LGBMClassifier in site 2, the Democratic Republic of Congo (AUC = 0.7829, Sensitivity= 0.2385, Specificity= 0.9732) and site 8, Belagavi, India (AUC=0.7863, Sensitivity: 0.2412, Specificity: 0.9868). Key predictive variables included the number of antenatal visits, early enrollment for follow-up, and antenatal care in the first trimester.
Conclusion: Machine learning algorithms effectively predicted prematurity using 20-week pregnancy data from seven low- and middle-income countries. These tools can aid in identifying mothers at risk for premature delivery, especially in low-income regions. Further research should validate this methodology in diverse settings with similar routinely collected data.