Skip to content

Abstract Search

Methods/Statistics

Leveraging electronic health record data to evidence collider stratification bias and inform clinical epidemiology: Type-1 diabetes and rotator cuff tears Ayush Giri* Ayush Giri Simone Herzberg Katherine Hartmann Hann-Heui Ong Max Breyer Wei-Qi Wei Nitin Jain

Background: Electronic health record (EHR) data are ever popular for case-control investigations. Imaging is often required for affirmative diagnosis of rotator cuff tears and due to high estimates of asymptomatic tears, many argue imaging should be required for control selection. Leveraging de-identified EHR-data to construct multiple control groups (with/without imaging), we evaluate evidence for selection bias through collider stratification using type-1 diabetes (T1D) and rotator cuff tears as an illustrative example.

Methods: We developed EHR-based algorithms to identify imaging-verified cases, and two control groups: one reliant on imaging data to confirm lack of tear (control group 1), and one without such requirement (control group 2). We compared key characteristics between cases and controls, and between the two control groups, and performed multivariable logistic regression analyses to compare associations between T1D and rotator cuff tears across two case-control designs.

Results: Cuff tear cases (Group 1) were older, more likely to have arthritis (57%), ligamentous disease (9%) and prior shoulder injury (99%) than either of the control groups (Fig. 1). However, control group 1 was more similar to cases than control group 2 with higher proportion of arthritis (Control group 1: 9% vs. Control group 2: 1%), ligamentous disease (6% vs. 2%), and prior shoulder injury (54% vs. 7%). T1D was present in 3% of cases, 4% of control group 1 and 1% of control group 2. The prevalence of T1D is ~1% in the US. T1D was positively associated with rotator cuff tears when using control group 2 (aOR=1.78; 95% CI = 1.64-1.92); and inversely associated when using control group 1 (aOR=0.75; 0.57-0.97).

Discussion: Imaging in EHR-data, indicated due to symptomology, is not systematic. Using multiple control groups in EHR-data we evidence how conditioning on imaging (descendent of a collider: symptomology) can perturb exposure estimation in controls, enough to reverse conclusions.