Jonathan Huang and Brian Whitcomb
While no definitive, go-to pedagogical text yet exists to teach the ins-and-out of causal inference in molecular epidemiology, we’ve put together a list of 7 papers we think address critical aspects of both the promise and challenges present in this task.
The first three (Mehta, et al; Yang et al; Gadbury, et al) present an overview of the basic challenge of causal inference in the high-dimensional / -omic wide space, notably focusing on the statistical inferential problems that exist even in experimental settings. Underlying this is a recognized need to evaluate methods using scalable techniques that respect the complex structures and size of genomic data, notably “plasmode” base simulation. Read more
That said, there remains little discussion about how to address unmeasured confounding and other causal effect identification challenges in this literature. This is a critical gap, since most molecular epidemiologic research will be conducted in non-experimental settings.
The next three papers (Swanson, et al; Cinelli, et al; Shi, et al) discuss the challenges and opportunities in observational effect identification presented by molecular data, notably through the use of genetic instrumental variables (Mendelian Randomization) and negative controls.
The final paper by Beesley, et al comprehensively overviews the inferential challenges presented by the increasingly population usage of large administrative and biospecimen repository (biobank) data, the fuel for much of recent molecular epidemiology.