Big Data/Machine Learning/AI
Improving Fairness in Predictive Models: A Multicalibration Analysis Across Brazilian Hospitals Fabiano Barcellos Filho* Fabiano Barcellos Filho Carine Savalli Alexandre Chiavegatto
Accurate mortality prediction is essential for improving patient care and optimizing resource allocation. This study evaluated the performance of machine learning models, such as XGBoost, CatBoost, and LightGBM, for mortality prediction in COVID-19 across Brazilian regions. A key focus was the impact of isotonic calibration on model fairness. Models were tuned with randomized search using AUC-ROC as the performance metric. The dataset was stratified by mortality outcome and systematically partitioned into training, validation, and test subsets.
Pre-calibration analysis revealed significant subgroup disparities. Males and younger patients (<65) consistently outperformed females and older adults (≥65). In the Northeast region, younger adults outperformed older adults (AUC of 0.9663 vs. 0.7735), while in the North, males outperformed females (0.9735 vs. 0.8250). These disparities highlighted the importance for calibration techniques to enhance model fairness and ensure equitable performance across diverse demographic groups.
After isotonic calibration, fairness improved across all regions. In the North, female AUC increased from 0.8250 to 0.9042, and older adults maintained exceptional performance, improving from 0.9886 to 0.9943. The Northeast showed improvements for older adults (AUC = 0.7735 to 0.7989) and males (AUC = 0.8848 to 0.9067), reducing gaps with younger adults. In the Southeast, disparities lessened, with male AUC rising from 0.8256 to 0.8421 and elderly AUC from 0.7734 to 0.7959. The South exhibited the most significant improvements, with older adults increasing its AUC from 0.7723 to 0.8348 while maintaining high male performance (0.9183 to 0.9422).
We conclude that isotonic calibration, while slightly reducing AUC in some regions, consistently enhanced fairness across age and gender groups. This demonstrates its utility in promoting equity in predictive healthcare, striking a balance between performance and fairness to better serve diverse populations.