Big Data/Machine Learning/AI
Federated learning for health outcomes predictions in a multicentric sample of hospitals Murilo Bigoto* Murilo Bigoto Alexandre Dias Porto Chiavegatto Filho
Employing patient data in healthcare for predictive algorithms raises technical and ethical challenges in privacy and security. Despite the growing use of Federated Learning (FL) to protect patient privacy, significant hurdles persist. Our study aimed to develop COVID-19 mortality risk prediction algorithms across 21 Brazilian hospitals, using diverse FL structures. Twenty-two predictors, including age, gender, vitals, and hematological data, were assessed in two scenarios. The first scenario evaluated Logistic Regression (LR) and Multilayer Perceptron (MLP). A global model was created by averaging coefficients from each hospital. The second scenario involved Random Forest (RF) and XGBoost (XGB), aggregating local trees into a global model. All models showed similar metrics (AUC-ROC: LR 0.80, RF 0.79, XGB 0.78, MLP 0.80). FL outperformed local learning in both scenarios, maintaining the same hyperparameter space. In local learning, RF gained 6.5%, LR 7.2%, XGB 5.5%, and MLP 12.8%. Lower gains for RF and XGB indicated superior local performance. Clear predictive enhancement was seen for hospitals with fewer patients. Hospital data aggregation by five Brazilian regions showed increased AUC-ROC in FL (0.838) compared to local models (0.835), a 0.34% improvement. Our study highlights the potential of FL in predicting health outcomes from diverse hospital sources, ensuring patient privacy and security. It emphasizes FL not only for local predictive enhancement but also for generalization gains across different contexts.