Confounding: Some introductory readings

Maria Glymour

Maria.glymour@ucsf.edu

Department of Epidemiology and Biostatistics

University of California, San Francisco

  • Hernán MA, Robins JM (2018). Causal Inference. Boca Raton: Chapman & Hall/CRC, forthcoming Chapter 7, pg 83-98.

    https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2017/10/hernanrobins_v1.10.33.pdf

    This is a great introduction, and I especially like the distinction between confounders and confounding. Their definition of a confounder is quite useful: a variable C is a confounder for estimating the effect of X on Y if C is a member of a sufficient set of covariates for identifying the effect of X on Y, and without C that set of covariates would no longer be sufficient.  In other words, if there’s any combination of covariates that is good enough to eliminate confounding if it includes C but not good enough if you omit C, we say C is a confounder.

  • Greenland, S. and H. Morgenstern (2001). “Confounding in health research.” Annual review of public health 22(1): 189-212.

    https://www.ncbi.nlm.nih.gov/pubmed/11274518

    Greenland and Morgenstern is an older review but gives a very nice introduction to a lot of critical ideas, including design versus analysis approaches to addressing confounding and the distinction between confounding and collapsibility.  If you’re interested in more discussion of Simpson’s paradox, see Pearl’s charming article showing that sometimes people use Simpson’s paradox to refer to confounding phenomena, but in other instances it is used to refer to collider bias (Pearl, Judea. “Comment: understanding Simpson’s paradox.” The American Statistician 68.1 (2014): 8-13).  This article isn’t primarily about confounding, but gets a shout-out for illustrating how a causal framework can help avoid utter confusion.

  • McCulloch CE. Editorial: observational studies, time-dependent confounding, and marginal structural models. Arthritis Rheumatol. 2015 Mar; 67(3):609-11.

    https://www.ncbi.nlm.nih.gov/pubmed/25371384

    This is a really clearly written and friendly explanation of time-varying confounding and the motivation for MSMs. It is targeted for a clinical audience but gives a very nice explanation of the problem that arises when variables are both common causes of exposure and outcome but also potentially influenced by exposure. He also talks about the tradeoffs when implementing MSMs as opposed to simpler models, with respect to both modeling challenges and statistical precision.

  • Morabia, A. (2011). “History of the modern epidemiological concept of confounding.” Journal of Epidemiology & Community Health 65(4): 297-300. 

    https://www.ncbi.nlm.nih.gov/pubmed/20696848

    Morabia gives a nice perspective on the evolution of how confounding is conceptualized.  It’s easy to forget how recently many core ideas in epi were adopted in the field. In many domains, terminology, methods, and interpretation remain inconsistent across the field.

  • Greenland, S., (2003). Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology, 14(3), pp.300-306. 

    http://journals.lww.com/epidem/abstract/2003/05000/quantifying_biases_in_causal_models__classical.9.aspx

    The reality is we almost never have perfect data and so in most analyses of observational data, we have a few irritating covariates that might be confounders but also were plausibly influenced by past values of the exposure.  This is an unpleasant situation and Greenland comes to the rescue by helping us think through which type of mistake is worse: adjusting for a collider or failing to adjust for a confounder. 

  • Angrist JD, Krueger AB. (2001) Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. The Journal of Economic Perspectives. 15(4):69-85.

    https://www.aeaweb.org/articles?id=10.1257/jep.15.4.69

     Economists worry as much about confounding as epidemiologists do, although they usually refer to it as “omitted variable bias”; the intuition for this label is that confounding is the bias that arises when you omit from your regression model an important correlate of exposure that “should be” included in the model.  What “should be” included in the model? Any correlates of the exposure that would also be correlated with the residuals of the regression if omitted from the model. Economists historically have not used counterfactual conceptualizations or DAGs to represent confounding but this definition works out to usually have similar implications as the causal understanding of confounding (intuitively: the omitted variable is correlated with the residuals because it influences the outcome).  When working with observational data, epidemiologists nearly always use methods to control confounding based on blocking all back-door paths (i.e., measuring a sufficient set of confounders).  In contrast, economists often assume that measuring a sufficient set of confounders is a hopeless endeavor and seek natural experiments to circumvent confounding, leaning on methods such as instrumental variables, regression discontinuity, or difference in difference approaches.

    Many thanks to Jeff Martin for suggested additions to this list.

Leave a Reply