Support simultaneous stacking and dodging by different variables in geom_col #6324

bakaburg1 · 2025-02-06T11:25:20Z

I'd like to propose adding support for simultaneous stacking and dodging controlled by different variables in geom_col. Currently, this common visualization need requires workarounds that are both verbose and harder to maintain.

Current Limitation

When using geom_col, we can either stack or dodge bars based on a grouping variable, but not both at the same time using different variables. This makes it difficult to create visualizations where we want to:

Stack bars by one categorical variable
Dodge the resulting stacks by another categorical variable

Here's a reprex with counts from surveillance data stratified by year, country and surveillance protocol

Minimal Reproducible Example

library(ggplot2)
library(dplyr)

# Sample data
df <- bind_rows(
    data.frame(
        year = rep(2016, 5),
        protocol = rep("M", 5),
        country = c("A", "B", "C", "D", "E"),
        freq = c(100, 50, 30, 40, 11)
    ),
    data.frame(
        year = rep(2016, 4),
        protocol = rep("L", 4),
        country = c("A", "B", "C", "D"),
        freq = c(23, 60, 200, 100)
    )
)

# Current workaround requires multiple geom_col calls
ggplot() +
    geom_col(
        data = df %>% filter(protocol == "M"),
        aes(x = year - .5, y = freq,
            fill = protocol, group = country),
        position = "stack",
        width = 0.4
    ) +
    geom_col(
        data = df %>% filter(protocol == "L"),
        aes(x = year + .5, y = freq,
            fill = protocol, group = country),
        position = "stack",
        width = 0.4
    )

Desired Behavior

Ideally, we would be able to specify both stacking and dodging variables in a single geom_col call, something like:

# Conceptual syntax (not working)
ggplot(df, aes(x = year, y = freq)) +
    geom_col(
        aes(fill = protocol, group = country),
        position = position_stackdodge(
            stack_by = "country",
            dodge_by = "protocol"
        )
    )

Use Cases

This functionality would be particularly useful for:

Comparing distributions across multiple categories
Visualizing nested hierarchical data
Creating more complex compositional charts without resorting to hacky solutions
Maintaining consistent spacing and positioning without manual x-axis adjustments

Benefits

More intuitive API for common visualization needs
Reduced code complexity
Better maintainability
Consistent positioning and spacing handled by ggplot2
Easier integration with scales and themes

teunbrand · 2025-02-06T11:47:08Z

Thanks for the report! This request is similar to #2267, which was closed as unplanned.
I think one reason we've been reluctant to implement this is because it would break the API as position adjustments do not have the right authority to include variables (like stack_by and dodge_by) from the data.
However, because we implemented #6100, I think this limitation no longer holds and this suggestion no longer would break the API.
For these reasons, I think this should be possible, but I'm not yet convinced that it belongs to ggplot2 and not an extension package.

bakaburg1 · 2025-02-06T13:42:44Z

Thank you!

In the meantime (with great help of various AIs) I developed an ad hoc geom. I still think that a position_ function is more appropriate since it could accommodate other geoms too (and I don't like the idea of a geom just for positioning) but I wasn't able to make one. Regarding whether to put it ggplot or not I would advise for the first solution. I was very surprised in the first place this was not possible already, it's something one would expect out of the box!

GeomStackDodgeCol <- ggproto(
    "GeomStackDodgeCol", GeomRect,
    required_aes = c("x", "y", "fill", "group"),
    default_aes = aes(
        colour = "black",
        linewidth = 0.5,
        linetype = 1,
        alpha = NA
    ),
    
    setup_data = function(data, params) {
        # Reset stacking for each x value and fill group
        data <- data |>
            group_by(x, fill) |>
            mutate(
                ymin = c(0, head(cumsum(y), -1)),
                ymax = cumsum(y)
            ) |>
            ungroup()
        
        # Compute dodging offsets with width and padding
        fill_groups <- unique(data$fill)
        n_groups <- length(fill_groups)
        width <- params$width %||% 0.9     # width of the bars
        padding <- params$padding %||% 0.1  # padding between bars
        
        # Calculate total width needed for the group
        total_width <- n_groups * width + (n_groups - 1) * padding * width
        
        # Calculate positions with proper spacing
        positions <- seq(-total_width/2, total_width/2, length.out = n_groups)
        
        # Create rectangle coordinates
        data$xmin <- data$x + positions[match(data$fill, fill_groups)] - width/2
        data$xmax <- data$x + positions[match(data$fill, fill_groups)] + width/2
        
        data
    },
    
    draw_panel = function(data, panel_params, coord, width = 0.9, ...) {
        coords <- coord$transform(data, panel_params)
        
        grid::rectGrob(
            x = (coords$xmin + coords$xmax)/2,
            y = (coords$ymin + coords$ymax)/2,
            width = coords$xmax - coords$xmin,
            height = coords$ymax - coords$ymin,
            default.units = "native",
            just = c("center", "center"),
            gp = grid::gpar(
                col = coords$colour,
                fill = alpha(coords$fill, coords$alpha),
                lwd = coords$linewidth * .pt,
                lty = coords$linetype
            )
        )
    },
    
    parameters = function(complete = FALSE) {
        c("na.rm", "width", "padding")
    }
)

geom_stackdodge_col <- function(mapping = NULL, data = NULL,
                            position = "identity", 
                            width = 0.9,
                            padding = 0.1,
                            na.rm = FALSE,
                            show.legend = NA,
                            inherit.aes = TRUE, ...) {
    layer(
        geom = GeomStackDodgeCol,
        mapping = mapping,
        data = data,
        stat = "identity",
        position = position,
        show.legend = show.legend,
        inherit.aes = inherit.aes,
        params = list(
            na.rm = na.rm,
            width = width,
            padding = padding
        )
    )
}

of course testing is mandated.

Here's some testing code:

local({
    df <- bind_rows(
        data.frame(
            year = rep(2016, 5),
            protocol = rep("M", 5),
            country = c("A", "B", "C", "D", "E"),
            freq = c(100, 50, 30, 40, 11) # sum is 231
        ),
        data.frame(
            year = rep(2016, 4),
            protocol = rep("L", 4),
            country = c("A", "B", "C", "D"),
            freq = c(23, 60, 200, 100) # sum is 383
        )
    )
    
   # Add more years
    df <- bind_rows(
        df,
        df |> mutate(year = 2017, freq = sample(freq)),
    )
    
    # Create summary data
    df_sum <- df |>
        summarise(
            label = paste(country, collapse = "\n"),
            freq = sum(freq),
            .by = c(year, protocol)
        )
    ggplot() +
        geom_stackdodge_col(
            data = df,
            aes(x = factor(year), y = freq, group = country,
                fill = protocol),
            width = 0.1, padding = 0.5
        ) +
        geom_hline(yintercept = c(sum(c(100, 50, 30, 40, 11), sum(c(23, 60, 200, 100) )) # To show that the bars sum up to the expected values
})

clauswilke · 2025-02-06T15:40:48Z

Regarding whether to put it ggplot or not I would advise for the first solution. I was very surprised in the first place this was not possible already, it's something one would expect out of the box!

We have for many years now followed the philosophy that only the absolute core features are in ggplot2 itself and other, less commonly used features should go into extension packages. Maybe this would be a good fit for ggforce for example.

Also, while I'm of the opinion that everybody should be allowed and empowered to make any visualization they want, I find it difficult to think of a valid use case for this geom. I've never in my life thought "hm, I want to stack and dodge at the same time." This is definitely an obscure corner case, and I feel reasonably confident that any figure you make with this feature can be improved by removing one of the two position adjustments.

bakaburg1 · 2025-02-06T16:56:14Z

Uhm, it's a pretty common scenario in epidemiology!

Should I cross post it to ggforce? Do they work also on position functions or only on geoms?

clauswilke · 2025-02-06T16:59:42Z

Show me a figure that uses this feature and I'll tell you how to improve the figure.

smouksassi · 2025-02-06T20:48:53Z

sorry my reprex is crashing this shows you can do what you want using facets

library(ggplot2)
library(dplyr)
library(patchwork)
#Sample data
df <- bind_rows(
data.frame(
year = rep(2016, 5),
protocol = rep("M", 5),
country = c("A", "B", "C", "D", "E"),
freq = c(100, 50, 30, 40, 11)
),
data.frame(
year = rep(2016, 4),
protocol = rep("L", 4),
country = c("A", "B", "C", "D"),
freq = c(23, 60, 200, 100)
),
data.frame(
year = rep(2017, 5),
protocol = rep("M", 5),
country = c("A", "B", "C", "D", "E"),
freq = c(100, 50, 30, 40, 11)
),
data.frame(
year = rep(2017, 4),
protocol = rep("L", 4),
country = c("A", "B", "C", "D"),
freq = c(23, 60, 200, 100)
)
)
a<- ggplot(data = df) +
geom_col(
aes(x = protocol, y = freq,
fill = country, group = country),
position = "stack",
width = 0.4
) +
scale_fill_viridis_d()+
facet_grid(~year)
b <- ggplot(data = df) +
geom_col(
aes(x = as.factor(year) , y = freq,
fill = country, group = country),
position = "stack",
width = 0.4
) +
scale_fill_viridis_d()+
facet_grid(~protocol)

a/b

davidhodge931 · 2025-02-10T21:08:27Z

I think stacking and dodging at the same time is useful. I've needed to do this in the past. I get by with hacking around using a combo of faceting, scale and theme adjustments. But it'd be awesome if a position_stackdodge function or similar was available to do this in a more elegant way

smouksassi · 2025-02-11T09:11:16Z

considerations default width of bars and also the ordering of factors:
here is what is currently possible and what can be done using the code above:

library(ggplot2)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(patchwork)

GeomStackDodgeCol <- ggproto(
  "GeomStackDodgeCol", GeomRect,
  required_aes = c("x", "y", "fill", "group"),
  default_aes = aes(
    colour = "red",
    linewidth = 0.5,
    linetype = 1,
    alpha = NA
  ),
  
  setup_data = function(data, params) {
    # Reset stacking for each x value and fill group
    data <- data |>
      group_by(x, fill) |>
      mutate(
        ymin = c(0, head(cumsum(y), -1)),
        ymax = cumsum(y)
      ) |>
      ungroup()
    
    # Compute dodging offsets with width and padding
    fill_groups <- unique(data$fill)
    n_groups <- length(fill_groups)
    width <- params$width %||% 0.9     # width of the bars
    padding <- params$padding %||% 0.1  # padding between bars
    
    # Calculate total width needed for the group
    total_width <- n_groups * width + (n_groups - 1) * padding * width
    
    # Calculate positions with proper spacing
    positions <- seq(-total_width/2, total_width/2, length.out = n_groups)
    
    # Create rectangle coordinates
    data$xmin <- data$x + positions[match(data$fill, fill_groups)] - width/2
    data$xmax <- data$x + positions[match(data$fill, fill_groups)] + width/2
    
    data
  },
  
  draw_panel = function(data, panel_params, coord, width = 0.9, ...) {
    coords <- coord$transform(data, panel_params)
    
    grid::rectGrob(
      x = (coords$xmin + coords$xmax)/2,
      y = (coords$ymin + coords$ymax)/2,
      width = coords$xmax - coords$xmin,
      height = coords$ymax - coords$ymin,
      default.units = "native",
      just = c("center", "center"),
      gp = grid::gpar(
        col = coords$colour,
        fill = alpha(coords$fill, coords$alpha),
        lwd = coords$linewidth * .pt,
        lty = coords$linetype
      )
    )
  },
  
  parameters = function(complete = FALSE) {
    c("na.rm", "width", "padding")
  }
)

geom_stackdodge_col <- function(mapping = NULL, data = NULL,
                                position = "identity", 
                                width = 0.9,
                                padding = 0.1,
                                na.rm = FALSE,
                                show.legend = NA,
                                inherit.aes = TRUE, ...) {
  layer(
    geom = GeomStackDodgeCol,
    mapping = mapping,
    data = data,
    stat = "identity",
    position = position,
    show.legend = show.legend,
    inherit.aes = inherit.aes,
    params = list(
      na.rm = na.rm,
      width = width,
      padding = padding
    )
  )
}
    df <- bind_rows(
        data.frame(
            year = rep(2016, 5),
            protocol = rep("M", 5),
            country = c("A", "B", "C", "D", "E"),
            freq = c(100, 50, 30, 40, 11) # sum is 231
        ),
        data.frame(
            year = rep(2016, 4),
            protocol = rep("L", 4),
            country = c("A", "B", "C", "D"),
            freq = c(23, 60, 200, 100) # sum is 383
        )
    )
    df <- bind_rows(
        df,
        df |> mutate(year = 2017, freq = sample(freq)),
    )
    
    # Create summary data
    df_sum <- df |>
        summarise(
            label = paste(country, collapse = "\n"),
            freq = sum(freq),
            .by = c(year, protocol)
        )
    ggplot() +
        geom_stackdodge_col(
            data = df,
            aes(x = factor(year), y = freq, group = country,
                fill = protocol),
            width = 0.1, padding = 0.5
        )

                                  
  
    

    #Sample data
    df <- bind_rows(
      data.frame(
        year = rep(2016, 5),
        protocol = rep("M", 5),
        country = c("A", "B", "C", "D", "E"),
        freq = c(100, 50, 30, 40, 11)
      ),
      data.frame(
        year = rep(2016, 4),
        protocol = rep("L", 4),
        country = c("A", "B", "C", "D"),
        freq = c(23, 60, 200, 100)
      ),
      data.frame(
        year = rep(2017, 5),
        protocol = rep("M", 5),
        country = c("A", "B", "C", "D", "E"),
        freq = c(100, 50, 30, 40, 11)
      ),
      data.frame(
        year = rep(2017, 4),
        protocol = rep("L", 4),
        country = c("A", "B", "C", "D"),
        freq = c(23, 60, 200, 100)
      )
    )
    

    
    
    
    a<- ggplot(data = df) +
      geom_col(
        aes(x = protocol, y = freq,
            fill = country, group = country),
        position = "stack",
        width = 0.4
      ) +
      scale_fill_viridis_d()+
      facet_grid(~year)
    b <- ggplot(data = df) +
      geom_col(
        aes(x = as.factor(year) , y = freq,
            fill = country, group = country),
        position = "stack",
        width = 0.4
      ) +
      scale_fill_viridis_d()+
      facet_grid(~protocol)
    
    a2<- ggplot(data = df) +
      geom_col(
        aes(x = protocol, y = freq,
            fill = protocol, group = country),
        position = "stack",color="red",
        width = 0.4
      ) +
      scale_fill_viridis_d()+
      facet_grid(~year) 
  c <-    ggplot(data = df) +
    geom_stackdodge_col(
      aes(x = as.factor(year) , y = freq,
          fill = protocol, group = country),
      width = 0.1, padding = 0,
    ) +
    scale_fill_viridis_d()
  a/a2/c + plot_layout(guide="collect")

^{Created on 2025-02-11 with reprex v2.1.1}

Standard output and standard error

-- nothing to show --

teunbrand · 2025-02-11T11:33:15Z

Thanks everyone for the examples. I don't think implementation is the barrier for this issue, but Claus' remark below is:

We have for many years now followed the philosophy that only the absolute core features are in ggplot2 itself and other, less commonly used features should go into extension packages.

So the relevant question is whether simulateously stacking + dodging is a core feature or not. One the one hand I think (but am not wholly convinced) that this can be useful in some circumstances. On the other hand, I want to agree with Claus that there is most likely is a better way to display data than stacking and dodging.

clauswilke · 2025-02-11T16:54:10Z

I know we're somewhat offtopic now, but since the question is "is this a core feature" and not "should this be possible at all", I want to point out that stacking more than two categories is almost always bad, because it's usually impossible to actually compare the stacked data values. I discuss this in my book here: https://clauswilke.com/dataviz/visualizing-proportions.html

In addition, I'm not a fan of mixing a display of proportions (which you get by stacking) with a display of absolute values (which you get from bars that have different overall heights). It creates additional confusion in the viewer, as the absolute amount of something may increase from one condition to the other while the relative proportion goes down.

I generally discourage people from stacking unless they're dealing with a binary variable (male/female, success/failure, etc).

davidhodge931 · 2025-02-11T20:30:19Z

Some stackoverflow and Posit community queries on this..

https://stackoverflow.com/questions/12715635/ggplot2-bar-plot-with-both-stack-and-dodge
https://stackoverflow.com/questions/43281303/combine-stack-and-dodge-with-bar-plot-in-ggplot2
https://stackoverflow.com/questions/65955368/making-a-bar-plot-with-stack-and-dodge-and-keep-the-dodged-bars-touching-one-an
https://stackoverflow.com/questions/67431155/ggplot-combine-dodge-with-stacked-barplot
https://forum.posit.co/t/how-to-ggplot-a-stacked-and-dodge-bar-chart-in-one/174243
https://forum.posit.co/t/geom-col-both-stacked-and-dodged-by-different-variables-have-wrong-bar-totals/197504
https://forum.posit.co/t/ggplot-position-dodge-with-position-stack/16425

davidhodge931 · 2025-02-11T20:52:57Z

An example below of a graph stacked and dodged copied from the internet.

In this one, the stacked type variable has heaps of values, but it could instead be a simple binary variable like Male/Female. It's also not super clear at present what the dodged stuff represents. But an alpha or pattern aesthetic could be used here.

You could do this instead by faceting - but then maybe you wanted to facet by a different variable anyway. You could potentially do this using patchwork. But everything gets hacky and difficult, compared to if there was a position adjustment for it

thomasp85 · 2025-02-11T21:28:10Z

The "can it be done" and "is it being done" has already been established. The question is whether it should be in ggplot2 or in an extension package. ggplot2 is opinionated and, like with secondary axes, we sometimes steer away from "popular" approaches because they are flawed be design. I'm afraid this also falls into such category which means that it will not end up in ggplot2-proper. However, we made the system extensible for a reason, so that you are not beholden to our pet peeves :-)

szkabel · 2025-02-11T21:47:58Z

Hi All,
This seems to be an exciting discussion.
I have had a feature request like a year ago for stratifying a column chart by a variable with shades. The plot was already dodged by colors. My solution was to extend the functionality of position_dodge.

Please see my pull-request: #6328
A bit more lengthy description here: https://rpubs.com/szkabel/dodgeStackDemo

I think that this is a needed feature as shown by the questions collected by @davidhodge931.

This was tested only for geom_col, but that seemed to be the most needed anyways. I also think it is more elegant than most of the above solutions. A minimal working example for the above case:

library(tidyverse)

base = bind_rows(
    tibble(count = c(10,9,26),type = factor(c(1:3))) %>% mutate(category = "A"),
    tibble(count = c(80,90,60),type = factor(c(1:3))) %>% mutate(category = "B") 
)
    
df = bind_rows(
  base %>% mutate(id = 1),
  base %>% mutate(id = 2),
  base %>% mutate(id = 3)
)

df %>% ggplot() + 
  aes(x = id,y = count, fill = type, alpha = category, group = category) +
  geom_col(position = position_dodge(stack_overlap = "by_extent")) +
  scale_alpha_manual(values = c("A" = 0.5, "B" = 1))

I acknowledge that it doesn't yet work for the labels.

teunbrand added positions 🥇 feature a feature request or enhancement labels Feb 6, 2025

thomasp85 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support simultaneous stacking and dodging by different variables in geom_col #6324

Support simultaneous stacking and dodging by different variables in geom_col #6324

bakaburg1 commented Feb 6, 2025

teunbrand commented Feb 6, 2025 •

edited

Loading

bakaburg1 commented Feb 6, 2025

clauswilke commented Feb 6, 2025

bakaburg1 commented Feb 6, 2025

clauswilke commented Feb 6, 2025 •

edited

Loading

smouksassi commented Feb 6, 2025 •

edited

Loading

davidhodge931 commented Feb 10, 2025

smouksassi commented Feb 11, 2025

teunbrand commented Feb 11, 2025

clauswilke commented Feb 11, 2025

davidhodge931 commented Feb 11, 2025

davidhodge931 commented Feb 11, 2025

thomasp85 commented Feb 11, 2025

szkabel commented Feb 11, 2025

Support simultaneous stacking and dodging by different variables in geom_col #6324

Support simultaneous stacking and dodging by different variables in geom_col #6324

Comments

bakaburg1 commented Feb 6, 2025

Current Limitation

Minimal Reproducible Example

Desired Behavior

Use Cases

Benefits

teunbrand commented Feb 6, 2025 • edited Loading

bakaburg1 commented Feb 6, 2025

clauswilke commented Feb 6, 2025

bakaburg1 commented Feb 6, 2025

clauswilke commented Feb 6, 2025 • edited Loading

smouksassi commented Feb 6, 2025 • edited Loading

davidhodge931 commented Feb 10, 2025

smouksassi commented Feb 11, 2025

teunbrand commented Feb 11, 2025

clauswilke commented Feb 11, 2025

davidhodge931 commented Feb 11, 2025

davidhodge931 commented Feb 11, 2025

thomasp85 commented Feb 11, 2025

szkabel commented Feb 11, 2025

teunbrand commented Feb 6, 2025 •

edited

Loading

clauswilke commented Feb 6, 2025 •

edited

Loading

smouksassi commented Feb 6, 2025 •

edited

Loading