Health Services/Policy
A call for guidance on when using multiple imputation to predict missing race and ethnicity data is ethical Arshya Gurbani* Arshya Gurbani Karen Nielsen Kevin Maloney Jalyane Arias
The case for disaggregated and complete race and ethnicity data is clear: it is necessary to measure disparity in order to address it. Yet, despite guidance on the necessity of collecting data on race and ethnicity, large gaps remain in administrative healthcare and national data sets. Multiple imputation (MI) has gained popularity as an approach to avoiding biases in the presence of missing data. Some argue that the cost of collecting accurate race data at study onset may be more burdensome than the risks of imputation, while others contend that intentional representation in studies and greater standardization of how race and ethnicity data are collected are a more sustainable route. Implementation guidance for MI focuses on the validity and sensitivity of specific MI methods. However, there remain critical gaps to assure the ethical adoption of imputation methods and evaluating the consequences of their use. Ethical guidance is important to determine whether imputation is appropriate and in the subsequent dissemination of results. There is little guidance available from the government, funders, or academia on how to handle missing race/ethnicity data when conducting imputation. Compelling guidance from the Urban Institute calls for impact assessments of the decision to impute, community representation throughout the data life cycle, clearer informed consent policies for subsequent use of secondary and missing data, and thresholds of acceptable benefit/burden considerations. This call for guidance should be amplified from within academia as well-it’s a conversation that should be had by epidemiologists and statisticians in conjunction with ethicists. This paper explores the extent of missing race data in large secondary health data sets, identifies gaps in established methods for imputing data, and makes a case the need for both ethical and statistical guidance both while analyzing and while communicating results of analyses involving imputed race data.