Big Data/Machine Learning/AI
Neural Networks for the prediction of neonatal mortality in Rio de Janeiro State, Brazil RENAN MORITZ VARNIER R ALMEIDA* Renan Renan Moritz V R Almeida Renan Moritz V R Almeida Nubia Karla de Oliveira Almeida
Objective: To predict neonatal mortality using hospital administrative data by means of Logistic Regression (LR) and Multilayer Perceptron (MLP) models.
Methods: Records of 167.928 singleton births in Rio de Janeiro State hospitals (Brazil), 2019-2020, were obtained from a national administrative information system (Datasus). Fifteen variables pertaining to characteristics of the mother, the pregnancy and the newborn were identified and used as predictors for neonatal mortality. Data were randomly split in two sub-sets: a training (70% of data) and a testing set, and two models were developed in the training set: An LR and a one hidden layer MLP. Given the relative rarity of the outcome variable, the training set was also “balanced”: a new dataset was created with proportions 30/70% for the minority/majority classes of the outcome, and the models were also developed for this set. The SMOTE-N (Synthetic Minority Oversampling Technique-Nominal), RO (Random Oversampling) and RU (Random Undersampling) algorithms were used for data balancing. Model performance was evaluated in the testing sub-set by the metrics: Accuracy, Precision, Recall (Sensitivity), F1-score, Specificity, and AUC. The analyses were done with the R and Python languages. LR models adopted a significance level of 5%.
Results: Overall, neonatal mortality rate was 0.6%, and all predictors could be retained in the LR models for SMOTEN and RO approaches (most important predictors: Apgar at birth, gestational age and birthweight). Whatever the model, both the LR and the MLP had AUCs close to 0.88. Sensitivity displayed low values for the unbalanced data, with significant improvements when balanced (0.12/0.09 for LR/MLP unbalanced to 0.59/0.33 balanced with SMOTE-N; 0.72/0.62 balanced with RO and 0.73/0.69 with RU).
Conclusions: Both the Logistic Regression and the Multilayer Perceptron were good classifiers for the studied data. Balancing training data helped improving model’s sensitivity.