Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Square shapes only for selection nodes, not adjusted variables #176

Open
mikedenly opened this issue Jan 6, 2025 · 1 comment
Open

Square shapes only for selection nodes, not adjusted variables #176

mikedenly opened this issue Jan 6, 2025 · 1 comment

Comments

@mikedenly
Copy link

Dear Malcolm,

Thank you so much for your absolutely incredible work on ggdag -- it is a phenomenal package!

I just have a suggestion regarding ggdag_adjustment_sets() and ggdag_dseparated(). Both commands put the adjusted variables in square shapes, but that makes it difficult to distinguish regular nodes from selection nodes for the purposes of selection bias and external validity (e.g., Bareinboim and Pearl 2013). Especially given that dagitty allows for square selection nodes by default, I think it would be helpful if you could choose another shape for adjusted variables in ggdag as well. Maybe octagons that resemble stop signs would be a better choice? This way, it would be clear to the reader that the adjusted variables are indeed blocking something, and octagons would still provide you with enough space to change the label within the node.

Relatedly, the other day I was reading Chapter 4 of your incredible new book, which is sure to become a classic, and I noticed a similar issue. While collider-induced selection bias is indeed a classic, it seems that you could expand the focus much beyond that to mere selection bias on the exposure, outcome, or a confounder. Putting external validity aside, as that does not seem to be a focus of your book, selection bias on the exposure, outcome, or a confounder often induces positivity violations make internal validity impossible. In epidemiology, Hernán 2017 has a relevant article. In my field of political science, we have many, many articles on selection bias (e.g., Geddes 1990). Below, you will find some relevant code that I used to make relevant DAGs of such situations for my students.

In any case, thank you in advance for your consideration and, again, thank you for your incredible contributions!

Best,
Mike

# Load libraries
library(ggdag)
library(tidyverse)

# Define the DAG with selection node (S)
selectionX_dag <- dagify(
  Y ~ X + Z,  
  X ~ Z + S,      
  exposure = "X",
  outcome = "Y",
  coords = list(
    x = c(X = 0, Z = 0, Y = 1, S = -0.5),  
    y = c(X = 0, Z = 1, Y = 0, S = 0.5)     
  )
)

# Convert the DAG into a tidy format
tidy_selectionX <- tidy_dagitty(selectionX_dag,
                                 layout = "nicely") %>%
  mutate(
    shape = ifelse(name == "S", "square", "circle") # Square: selection
  )

# Plot the DAG
ggplot(tidy_selectionX) +
  geom_dag_edges(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_node(aes(x = x, y = y, shape = shape, color = name), 
                size = 12, 
                fill = "white", # Colored border, white fill
                stroke = 2) +  
  geom_dag_label(aes(x = x, y = y, label = name), 
                 size = 5, fontface = "bold") +
  scale_shape_manual(values = c("circle" = 21, "square" = 22)) + 
  scale_color_manual(
    values = c("X" = "#F8766D", # matches ggdag_status()
               "Y" = "#00BFC4", # matches ggdag_status()
               "Z" = "gray", 
               "S" = "orange")
  ) +
  theme_dag() +
  labs(title = "Sample Selection Bias on the Independent Variable (X)") +
  theme(
    plot.title = element_text(hjust = 0.5),
    legend.position = "none"  # Remove legends
  )

selection_node

@malcolmbarrett
Copy link
Collaborator

Thanks for the kind words and support.

My overall instinct here is that you have done exactly as I'd hoped in the design of ggdag, which is to say, you wanted something besides the default and were able to do it with the underlying geoms.

That said, I haven't thought this particular issue through a lot, and dagitty will soon have selectedNodes() to represent conditioning that is not statistical adjustment, so it may be something worth considering. We'll also cover transportability and generalizability a little in the book (towards the end) so it might be that I have firmer opinion on this then. So I'll leave this issue open for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants