Social
Estimating Causal Effects of Community Drivers of Individual Health in the Presence of Unmeasured Between-Community Confounding in Large Data Settings John Halifax* John Halifax Halifax Halifax UC Berkeley, School of Public Health
Community characteristics, such as neighborhood violence, have been hypothesized as a driver of health and health disparities. However, causal effects remain insufficiently elucidated, in part due to analytical challenges: clustering within communities reduces the effective sample size; confounding occurs at multiple levels and is high-dimensional; unmeasured confounding between communities is often unaddressed. Additional challenges arise when using registry data. The large number of clusters make fixed effects regression (for between-community confounding) computationally infeasible, but clusters often have insufficient observations to support community-stratified estimation that resolves within-community confounding.
We extend TMLE for evaluating community effects with registry-scale data using pseudo-demeaned logistic regression to learn outcome and exposure mechanisms while addressing between-community unmeasured confounding. We evaluated performance with two synthetic multilevel, registry-scale data generating processes (DGP), one linear and one non-linear, with between-community unmeasured confounding (U). We compared 6 TMLEs using different learners: 1) oracle learners with correctly specified regressions including U as a benchmark, 2) naïve learners not accounting for between-community confounding, 3) dummy variable fixed effect learners, 4) random intercept learners, 5) pseudo-demeaned learners, and 6) super learned ensembles of pseudo-demeaned regressions.
The pseudo-demeaned and super learned pseudo-demeaned TMLE estimators matched performance with the oracle estimator in the linear DGP. In the non-linear DGP, only the super learned pseudo-demeaned TMLE matched oracle performance. The TMLE with dummy variable fixed effects failed computationally, while the random intercept TMLE failed to converge before a user-set time limit. This research suggests the super learned pseudo-demeaned TMLE is a flexible, scalable estimator for community determinants of health.

