Estimating Causal Effects of Community Drivers of Individual Health in the Presence of Unmeasured Between-Community Confounding in Large Data Settings

Presenting Author

John Halifax

UC Berkeley, School of Public Health

Submitting Author

John Halifax

Additional Authors

Laura B. Balzer, Jennifer Ahern

Abstract

Community characteristics, such as neighborhood violence, have been hypothesized as a driver of health and health disparities. However, causal effects remain insufficiently elucidated, in part due to analytical challenges: clustering within communities reduces the effective sample size; confounding occurs at multiple levels and is high-dimensional; unmeasured confounding between communities is often unaddressed. Additional challenges arise when using registry data. The large number of clusters make fixed effects regression (for between-community confounding) computationally infeasible, but clusters often have insufficient observations to support community-stratified estimation that resolves within-community confounding.

We extend TMLE for evaluating community effects with registry-scale data using pseudo-demeaned logistic regression to learn outcome and exposure mechanisms while addressing between-community unmeasured confounding. We evaluated performance with two synthetic multilevel, registry-scale data generating processes (DGP), one linear and one non-linear, with between-community unmeasured confounding (U). We compared 6 TMLEs using different learners: 1) oracle learners with correctly specified regressions including U as a benchmark, 2) naïve learners not accounting for between-community confounding, 3) dummy variable fixed effect learners, 4) random intercept learners, 5) pseudo-demeaned learners, and 6) super learned ensembles of pseudo-demeaned regressions.

The pseudo-demeaned and super learned pseudo-demeaned TMLE estimators matched performance with the oracle estimator in the linear DGP. In the non-linear DGP, only the super learned pseudo-demeaned TMLE matched oracle performance. The TMLE with dummy variable fixed effects failed computationally, while the random intercept TMLE failed to converge before a user-set time limit. This research suggests the super learned pseudo-demeaned TMLE is a flexible, scalable estimator for community determinants of health.

Abstract Search

Abstract