Causal Inference
Simulations to improve the rigor & reproducibility of real-data applications Nerissa Nance* Nerissa Nance Maya Petersen Mark van der Laan Laura Balzer
The Roadmap for Causal Inference outlines a systematic approach to our research endeavors: define the effect of interest, evaluate needed assumptions, conduct statistical estimation, and carefully interpret results. At the estimation step, it is essential that the estimation algorithm be carefully pre-specified to optimize its expected performance for the specific real-data application. Simulations that realistically reflect the application, including key characteristics such as strong confounding and rare or missing outcomes, can help us gain a better understanding of an estimator’s performance and achieve this goal. We illustrate this with two examples, using the Causal Roadmap and realistic simulations to inform estimation selection and full specification of the Statistical Analysis Plan. First, in an observational longitudinal study, outcome-blind simulations are used to inform nuisance parameter estimation and variance estimation for longitudinal targeted maximum likelihood estimation (TMLE). Second, in a cluster-randomized control trial with missing outcomes, exposure-blind simulations are used to ensure control for type-I error in Two-Stage TMLE. In both examples, realistic simulations empower us to pre-specify an estimator that is expected to have strong finite sample performance and also yield quality-controlled computing code for the actual analysis. Together, this process helps to improve the rigor and reproducibility of our research.