Big Data/Machine Learning/AI
Machine Learning and racial disparities in health Keisyanne de Araujo Moura* Keisyanne Araujo Moura Letícia Gabrielle Souza Augusto César Ferreira de Moraes Alexandre D. P. Chiavegatto Filho
Background
Machine learning (ML) models are increasingly being developed to predict health outcomes. The implementation of ML algorithms in healthcare has proven to be increasingly promising. However, the use of algorithms can introduce biases inherent in datasets, leading to issues such as algorithmic discrimination and presenting a potentially far-reaching threat to equity in healthcare.
Methods: Systematic literature review through the evaluation and synthesis of published articles, registered in PROSPERO. Searches were conducted in major databases, including articles that tested the use of ML, presented algorithm performance criteria, were original research with a focus on racial disparities. Well-defined inclusion and exclusion criteria.
Results: The initial search yielded 3,204 potentially eligible information titles, which were refined after applying inclusion and exclusion criteria, resulting in the selection of 13 articles for evaluation. Regarding the training-test division, six articles used k-fold cross-validation and seven studies carried out a complete division strategy between training and testing, with training percentages ranging between 60 and 88.15% and tests between 20 and 31.2 %. The main algorithms used were XGboost, Random Forest, and MSE, with AUCs ranging from 0.71 to 0.94. Studies have identified a wide range of issues related to racial disparities in machine learning, including different predictive accuracy for racial minorities, inequalities due to performance metric selection, and the presence of large racial imbalances in training data.
Conclusions: A central concern lies in training algorithms with data that mirrors societal biases. This scenario can lead to unfair outcomes and racial disparities in healthcare. Therefore, it is crucial to conduct a thorough analysis of biases embedded during the development of artificial intelligence models, taking into account sensitive variables such as race and ethnicity.