Join Dr. Ana Diez-Roux, for SERforum Live!
May 1, 12pm EST


Do you ever find yourself struggling to figure out a question about epidemiologic methods, or other topics in epidemiology, and don’t know who to ask? The SERforum allows for individuals to answer questions that come up in our daily work around substantive and methodological topics in epidemiology.
All topics may be viewed, but to read and post comments, SER membership is required. If you are member, login! Not a member, join us!

You need to log in to create posts and topics.

DAGs - arguments against their use

What are the shortcomings of DAGs/what are their limitations?

What is the value-added from using DAGs/how do they enhance research?


Background for this topic: I have heard others not using DAGs or not using them often because of their limitations, but what are their limitations? I have never seen them clearly delineated.

Full disclosure: I am someone who finds DAGs to be a very helpful tool for streamlining communication (particularly about selection biases, instrumental variable analyses, time-varying confounding affected by prior exposure, to name a few topics) as long as both parties understand the basic rules of DAGs.

I am not sure what "limitations" are preventing your colleagues from using DAGs. I can think of some cautions (see below) but they are not substantial enough to prevent me using DAGs. Most of my colleagues who do not like/use DAGs feel that they have some intuition for confounding/selection bias and that DAGs are unnecessary for their research. I concede that in many analyses (now that I have developed a good intuition for confounding/selection bias) I do not go through the step of drawing the DAG on paper when selecting my covariate adjustment set. I just sort of visualize it in my head...

The cautions I would suggest include:

1. The DAG rules for when you have competing events are unclear and typical practice (drawing separate nodes for the event of interest and the competing event) does not make it clear that you need to adjust for confounders of BOTH the exposure/event of interest AND the exposure/competing event (see Lesko & Lau, Epidemiology 2017 for more on this). In speaking with someone more intimately familiar with all the rules of do-calculus, it sounds like DAGs do actually account for this dependence between the event of interest and the competing event (something called "capital D separation"?) but you have to be more versed in DAGs than the typical epidemiologist to know this. We (I, Bryan Lau, Ilya Shpitser) had a brief conversation and believe that putting all your outcomes as a multistate single node would lead to you correct conclusions about the sufficient set of confounders; but there are other things (selection bias) about analyzing data in the presence of competing events that are more clear when you include the outcomes as separate nodes.

2. You can have selection bias without selection being a collider on your DAG that you have conditioned on (see Westreich, Epidemiology 2012). Standard DAG rules will still show that you have d-separation between your exposure and your outcome.

There are other limitations of DAGs that I can think of, but they are not inherent to DAGs, per say. I have had colleagues suggest (in not so many words) that use of a DAG gives a false sense of confidence that you can estimate THE CAUSAL EFFECT. I would say that drawing a DAG merely forces you to be explicit about what you think confounders are and just because you can achieve confounder control assuming your DAG is accurate, you still have to assume your DAG is accurate. It's akin to just saying in words, without a picture or a formal framework, "here are the set of confounders I controlled for" (and thus implicitly, these are the things that I think will get me d-separation on a DAG, if I were to draw the DAG). Using a DAG does not absolve you from making all the causal identifiability assumptions you'd have to make if you didn't use a DAG. If you are uncertain about the structure of the DAG, do sensitivity analyses, analyzing your data as if it arose under alternate possible data generating mechanisms (alternate DAGs).

Finally, a more minor point: DAGs can be "unfaithful" - that is, associations that should exist in the data under the structure of the DAG may not exist. This may happen if there is perfect balancing out of associations, for example. This is not a problem, per say. It just means that some adjustments will have no impact on bias even though the DAG suggests they should.

I am by no means an Expert on DAGs and so welcome additional discussion on this point, but hopefully this response is helpful!

I've only run into DAG-resistance once, and it was from one of those researchers with a "confounder intuition" like Katie mentioned. Otherwise I can't think of a reason not to use them.

As the others above, I really like causal DAGs. But I think it is important to understand why some very intelligent people might not.

One of the challenges for some people is that causal DAGs do not show interactions. Although some people consider this a limitation, I view it as a strength. First, if there is no interaction on the additive scale, there will be interaction on the multiplicative scale, and vice versa. Therefore, a causal DAG would only be able to show interactions if it were scale specific. However, if the causal DAG suggests no bias with a particular adjustment set, there is no bias on any scale. This makes the tool very powerful. At the same time, people would like the graphs to show interactions because they believe they are important. If a tool does not include something you believe is important, it is a de-motivating factor for its use.

Many people misinterpret causal DAGs because they seem really easy. Some critics believe this encourages lazy thinking. For example, let's say you have arrows from both A and C into Y. Does this mean that if you change C from 0 to 1 in an individual that you will cause a change in the probability of Y? Not exactly. The causal DAG just means Y is some function of its "ancestors". If C only modified the effect of A on Y but had no effect on its own, then changing C from 0 to 1 will change the probability of Y but only for those where A=1.

Now what if we link the two paragraphs immediately above? And also include the fact that causal DAGs only need to include variables that lie along "confounding" paths which are both causes of the exposure of interest and the outcomes. Let us say that C is a cause of the exposure. Let us also say that C modifies the effect of A on the multiplicative scale but not on the additive scale. If C does not modify the effect on the additive scale, and C has no direct effect on Y, then Y is not a function of C on the additive scale and no arrow should be drawn. However, Y is a function of C on the multiplicative scale, and therefore an arrow should be drawn. These "conflicts" are likely to give some people a headache as they try to work through the issues, and headaches are not considered motivating factors for adopting behaviour.

Although causal DAGs usually omit variables that do not lie along confounding paths, they are important to include for some types of research questions. For some examples, see: Pearl J. A linear “microscope” for interventions and counterfactuals. Journal of Causal Inference. 5(1):341. If one starts to include all of these types of variables, the diagram becomes very cluttered very quickly, and hence less useful at "transparently" communicating assumptions.

Finally, causal DAGs represent only one form of causal graphical diagrams. Among other options, there are "ancestral graphs" and "single world intervention graphs (SWIGs)". I don't know enough about the advantages and disadvantages of these different approaches but it is clear that these are discussing particular contexts where the authors felt that causal DAGs were not able to address.

There are many relevant and complimentary approaches available in the epidemiologist's toolbox. And DAGs are one of them. In my mind they help to provide what I consider to be the contextualization/framing of the study question (whether causal or non-causal). Though they lack some elements which make them seem like simpler versions of structural equation models and structural causal models, which take the next steps in proposing the articulation of the data generating process. These latter approaches make greater assumptions and if you believe all models are incorrect, can lead you down a narrower path.  In addition, I don't really recall seeing DAGs with signs on their edges, which would require greater assumptions and domain knowledge in order to incorporate them, especially when non-linear relationships could be involved or complex relationships/data types. Given these comments, I feel DAGs have a unique place in framing epidemiologic inquiry - at least in most straightforward examples.