Skip to content

Abstract Search

Methods/Statistics

Privacy-preserving record linkage in community settings: merging EMR & HMIS data Paulina Kaiser* Paulina Kaiser Zoe Herrera Carson Mowrer Jonathan Ratliff Cory Hackstedt Barbara Hanley Mark Edwards

The difficulty of merging siloed datasets, especially datasets that include identifiable information, is a major barrier in epidemiologic research. Privacy-preserving record linkage (PPRL) allows investigators to match individuals across datasets using cryptographically hashed information instead of direct identifiers. We used this technique to explore how characteristics of homelessness were associated with health status and healthcare utilization in a three-county suburban region in Oregon. We merged electronic medical records data from a regional healthcare system with data from the local HUD-designated community action agency’s Homeless Management Information System (HMIS). The EMR dataset included measures of healthcare utilization (urgent care visits, ED visits, and hospitalizations) and total costs for members of Oregon’s Medicaid program. The HMIS dataset included assessments of social history and vulnerability. We merged on a hashed ID created from first name, last name, and date of birth, resulting in a total of 775 individuals matched across datasets at a similarity threshold of 0.85. Individuals in the merged dataset were 58% male and 88% white. Compared to individuals with a low vulnerability index score, those with high vulnerability were more likely to have two or more ED visits in 2022 (48% vs 36%) and were more likely have two or more urgent care visits (11% vs 6%). Individuals defined as chronically homeless were more likely to have multiple ED visits (43% vs 29%) and multiple hospitalizations (9% vs 3%) compared to the non-chronically homeless. Additional analyses are planned, including multiple regression models. PPRL offers a practical, low-cost way to overcome barriers to merging HIPAA-protected identifying information across siloed datasets. This technique is feasible in community-based settings and has potential to inform targeted interventions to improve population health outcomes.