Methods/Statistics
Disease Risk Score Methods for Studying Treatment Effect Heterogeneity in Real-World Data Haedi Thelen* Haedi Thelen Thelen Thelen Thelen Thelen Thelen University of Pennsylvania
Background: Stratification by an internally derived disease risk score (DRS) can be used to examine heterogeneity of treatment effect (HTE) and to control confounding. However, internal derivation methods developed in randomized trials require adaptation for real-world data, where confounding and DRS model transportability may bias stratum-specific treatment effect estimates.
Objective: To determine whether inverse probability weighting (IPW), when used for DRS score estimation (to address DRS transportability) or outcome modeling (to address confounding) reduces bias and improves efficiency of treatment effects estimated within DRS strata.
Methods: We compared three strategies with increasing IPW use: 1) none, 2) for treatment effect estimation only, 3) for both DRS estimation and treatment effect estimation. We simulated a million individuals with a binary treatment (35% treated), 12 confounding covariates, and a binary outcome (15% incidence in unexposed). The overall treatment effect was a risk difference of 0.08, and the stratum-specific effects ranged from 0.02 in the lowest to 0.18 in the highest risk strata. We repeatedly sampled 10,000 patients, fit a correctly specified DRS model, and compared the bias and root mean square error (RMSE) of the estimated treatment effect within DRS quartiles. To avoid data reuse bias, we used sample splitting to fit DRS models on one half of the unexposed, who were then excluded from effect estimation.
Results: Strategy 1 showed residual confounding (average percent bias across strata was 9.6% with larger bias in the highest risk stratum, see Figure). Strategy 2 reduced the average percent bias to 4.7% and improved RMSE. Strategy 3 did not further reduce bias or RMSE.
Conclusion: DRS stratification alone did not adequately control confounding within strata, whereas IPW of the outcome model substantially reduced bias. Weighting the DRS model did not improve bias or efficiency and may be unnecessary in this setting.

