Square shapes only for selection nodes, not adjusted variables #176

mikedenly · 2025-01-06T03:29:14Z

Dear Malcolm,

Thank you so much for your absolutely incredible work on ggdag -- it is a phenomenal package!

I just have a suggestion regarding ggdag_adjustment_sets() and ggdag_dseparated(). Both commands put the adjusted variables in square shapes, but that makes it difficult to distinguish regular nodes from selection nodes for the purposes of selection bias and external validity (e.g., Bareinboim and Pearl 2013). Especially given that dagitty allows for square selection nodes by default, I think it would be helpful if you could choose another shape for adjusted variables in ggdag as well. Maybe octagons that resemble stop signs would be a better choice? This way, it would be clear to the reader that the adjusted variables are indeed blocking something, and octagons would still provide you with enough space to change the label within the node.

Relatedly, the other day I was reading Chapter 4 of your incredible new book, which is sure to become a classic, and I noticed a similar issue. While collider-induced selection bias is indeed a classic, it seems that you could expand the focus much beyond that to mere selection bias on the exposure, outcome, or a confounder. Putting external validity aside, as that does not seem to be a focus of your book, selection bias on the exposure, outcome, or a confounder often induces positivity violations make internal validity impossible. In epidemiology, Hernán 2017 has a relevant article. In my field of political science, we have many, many articles on selection bias (e.g., Geddes 1990). Below, you will find some relevant code that I used to make relevant DAGs of such situations for my students.

In any case, thank you in advance for your consideration and, again, thank you for your incredible contributions!

Best,
Mike

# Load libraries
library(ggdag)
library(tidyverse)

# Define the DAG with selection node (S)
selectionX_dag <- dagify(
  Y ~ X + Z,  
  X ~ Z + S,      
  exposure = "X",
  outcome = "Y",
  coords = list(
    x = c(X = 0, Z = 0, Y = 1, S = -0.5),  
    y = c(X = 0, Z = 1, Y = 0, S = 0.5)     
  )
)

# Convert the DAG into a tidy format
tidy_selectionX <- tidy_dagitty(selectionX_dag,
                                 layout = "nicely") %>%
  mutate(
    shape = ifelse(name == "S", "square", "circle") # Square: selection
  )

# Plot the DAG
ggplot(tidy_selectionX) +
  geom_dag_edges(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_node(aes(x = x, y = y, shape = shape, color = name), 
                size = 12, 
                fill = "white", # Colored border, white fill
                stroke = 2) +  
  geom_dag_label(aes(x = x, y = y, label = name), 
                 size = 5, fontface = "bold") +
  scale_shape_manual(values = c("circle" = 21, "square" = 22)) + 
  scale_color_manual(
    values = c("X" = "#F8766D", # matches ggdag_status()
               "Y" = "#00BFC4", # matches ggdag_status()
               "Z" = "gray", 
               "S" = "orange")
  ) +
  theme_dag() +
  labs(title = "Sample Selection Bias on the Independent Variable (X)") +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "none"  # Remove legends
  )

The text was updated successfully, but these errors were encountered:

malcolmbarrett · 2025-01-10T13:31:05Z

Thanks for the kind words and support.

My overall instinct here is that you have done exactly as I'd hoped in the design of ggdag, which is to say, you wanted something besides the default and were able to do it with the underlying geoms.

That said, I haven't thought this particular issue through a lot, and dagitty will soon have selectedNodes() to represent conditioning that is not statistical adjustment, so it may be something worth considering. We'll also cover transportability and generalizability a little in the book (towards the end) so it might be that I have firmer opinion on this then. So I'll leave this issue open for now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Sponsors

Square shapes only for selection nodes, not adjusted variables #176

Square shapes only for selection nodes, not adjusted variables #176

mikedenly commented Jan 6, 2025

malcolmbarrett commented Jan 10, 2025

Square shapes only for selection nodes, not adjusted variables #176

Square shapes only for selection nodes, not adjusted variables #176

Comments

mikedenly commented Jan 6, 2025

malcolmbarrett commented Jan 10, 2025