-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support simultaneous stacking and dodging by different variables in geom_col #6324
Comments
Thanks for the report! This request is similar to #2267, which was closed as unplanned. |
Thank you! In the meantime (with great help of various AIs) I developed an ad hoc geom. I still think that a position_ function is more appropriate since it could accommodate other geoms too (and I don't like the idea of a geom just for positioning) but I wasn't able to make one. Regarding whether to put it ggplot or not I would advise for the first solution. I was very surprised in the first place this was not possible already, it's something one would expect out of the box! GeomStackDodgeCol <- ggproto(
"GeomStackDodgeCol", GeomRect,
required_aes = c("x", "y", "fill", "group"),
default_aes = aes(
colour = "black",
linewidth = 0.5,
linetype = 1,
alpha = NA
),
setup_data = function(data, params) {
# Reset stacking for each x value and fill group
data <- data |>
group_by(x, fill) |>
mutate(
ymin = c(0, head(cumsum(y), -1)),
ymax = cumsum(y)
) |>
ungroup()
# Compute dodging offsets with width and padding
fill_groups <- unique(data$fill)
n_groups <- length(fill_groups)
width <- params$width %||% 0.9 # width of the bars
padding <- params$padding %||% 0.1 # padding between bars
# Calculate total width needed for the group
total_width <- n_groups * width + (n_groups - 1) * padding * width
# Calculate positions with proper spacing
positions <- seq(-total_width/2, total_width/2, length.out = n_groups)
# Create rectangle coordinates
data$xmin <- data$x + positions[match(data$fill, fill_groups)] - width/2
data$xmax <- data$x + positions[match(data$fill, fill_groups)] + width/2
data
},
draw_panel = function(data, panel_params, coord, width = 0.9, ...) {
coords <- coord$transform(data, panel_params)
grid::rectGrob(
x = (coords$xmin + coords$xmax)/2,
y = (coords$ymin + coords$ymax)/2,
width = coords$xmax - coords$xmin,
height = coords$ymax - coords$ymin,
default.units = "native",
just = c("center", "center"),
gp = grid::gpar(
col = coords$colour,
fill = alpha(coords$fill, coords$alpha),
lwd = coords$linewidth * .pt,
lty = coords$linetype
)
)
},
parameters = function(complete = FALSE) {
c("na.rm", "width", "padding")
}
)
geom_stackdodge_col <- function(mapping = NULL, data = NULL,
position = "identity",
width = 0.9,
padding = 0.1,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
geom = GeomStackDodgeCol,
mapping = mapping,
data = data,
stat = "identity",
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
width = width,
padding = padding
)
)
} of course testing is mandated. Here's some testing code: local({
df <- bind_rows(
data.frame(
year = rep(2016, 5),
protocol = rep("M", 5),
country = c("A", "B", "C", "D", "E"),
freq = c(100, 50, 30, 40, 11) # sum is 231
),
data.frame(
year = rep(2016, 4),
protocol = rep("L", 4),
country = c("A", "B", "C", "D"),
freq = c(23, 60, 200, 100) # sum is 383
)
)
# Add more years
df <- bind_rows(
df,
df |> mutate(year = 2017, freq = sample(freq)),
)
# Create summary data
df_sum <- df |>
summarise(
label = paste(country, collapse = "\n"),
freq = sum(freq),
.by = c(year, protocol)
)
ggplot() +
geom_stackdodge_col(
data = df,
aes(x = factor(year), y = freq, group = country,
fill = protocol),
width = 0.1, padding = 0.5
) +
geom_hline(yintercept = c(sum(c(100, 50, 30, 40, 11), sum(c(23, 60, 200, 100) )) # To show that the bars sum up to the expected values
}) |
We have for many years now followed the philosophy that only the absolute core features are in ggplot2 itself and other, less commonly used features should go into extension packages. Maybe this would be a good fit for ggforce for example. Also, while I'm of the opinion that everybody should be allowed and empowered to make any visualization they want, I find it difficult to think of a valid use case for this geom. I've never in my life thought "hm, I want to stack and dodge at the same time." This is definitely an obscure corner case, and I feel reasonably confident that any figure you make with this feature can be improved by removing one of the two position adjustments. |
Uhm, it's a pretty common scenario in epidemiology! Should I cross post it to ggforce? Do they work also on position functions or only on geoms? |
Show me a figure that uses this feature and I'll tell you how to improve the figure. |
library(ggplot2) a/b |
I think stacking and dodging at the same time is useful. I've needed to do this in the past. I get by with hacking around using a combo of faceting, scale and theme adjustments. But it'd be awesome if a |
considerations default width of bars and also the ordering of factors: library(ggplot2)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(patchwork)
GeomStackDodgeCol <- ggproto(
"GeomStackDodgeCol", GeomRect,
required_aes = c("x", "y", "fill", "group"),
default_aes = aes(
colour = "red",
linewidth = 0.5,
linetype = 1,
alpha = NA
),
setup_data = function(data, params) {
# Reset stacking for each x value and fill group
data <- data |>
group_by(x, fill) |>
mutate(
ymin = c(0, head(cumsum(y), -1)),
ymax = cumsum(y)
) |>
ungroup()
# Compute dodging offsets with width and padding
fill_groups <- unique(data$fill)
n_groups <- length(fill_groups)
width <- params$width %||% 0.9 # width of the bars
padding <- params$padding %||% 0.1 # padding between bars
# Calculate total width needed for the group
total_width <- n_groups * width + (n_groups - 1) * padding * width
# Calculate positions with proper spacing
positions <- seq(-total_width/2, total_width/2, length.out = n_groups)
# Create rectangle coordinates
data$xmin <- data$x + positions[match(data$fill, fill_groups)] - width/2
data$xmax <- data$x + positions[match(data$fill, fill_groups)] + width/2
data
},
draw_panel = function(data, panel_params, coord, width = 0.9, ...) {
coords <- coord$transform(data, panel_params)
grid::rectGrob(
x = (coords$xmin + coords$xmax)/2,
y = (coords$ymin + coords$ymax)/2,
width = coords$xmax - coords$xmin,
height = coords$ymax - coords$ymin,
default.units = "native",
just = c("center", "center"),
gp = grid::gpar(
col = coords$colour,
fill = alpha(coords$fill, coords$alpha),
lwd = coords$linewidth * .pt,
lty = coords$linetype
)
)
},
parameters = function(complete = FALSE) {
c("na.rm", "width", "padding")
}
)
geom_stackdodge_col <- function(mapping = NULL, data = NULL,
position = "identity",
width = 0.9,
padding = 0.1,
na.rm = FALSE,
show.legend = NA,
inherit.aes = TRUE, ...) {
layer(
geom = GeomStackDodgeCol,
mapping = mapping,
data = data,
stat = "identity",
position = position,
show.legend = show.legend,
inherit.aes = inherit.aes,
params = list(
na.rm = na.rm,
width = width,
padding = padding
)
)
}
df <- bind_rows(
data.frame(
year = rep(2016, 5),
protocol = rep("M", 5),
country = c("A", "B", "C", "D", "E"),
freq = c(100, 50, 30, 40, 11) # sum is 231
),
data.frame(
year = rep(2016, 4),
protocol = rep("L", 4),
country = c("A", "B", "C", "D"),
freq = c(23, 60, 200, 100) # sum is 383
)
)
df <- bind_rows(
df,
df |> mutate(year = 2017, freq = sample(freq)),
)
# Create summary data
df_sum <- df |>
summarise(
label = paste(country, collapse = "\n"),
freq = sum(freq),
.by = c(year, protocol)
)
ggplot() +
geom_stackdodge_col(
data = df,
aes(x = factor(year), y = freq, group = country,
fill = protocol),
width = 0.1, padding = 0.5
)
#Sample data
df <- bind_rows(
data.frame(
year = rep(2016, 5),
protocol = rep("M", 5),
country = c("A", "B", "C", "D", "E"),
freq = c(100, 50, 30, 40, 11)
),
data.frame(
year = rep(2016, 4),
protocol = rep("L", 4),
country = c("A", "B", "C", "D"),
freq = c(23, 60, 200, 100)
),
data.frame(
year = rep(2017, 5),
protocol = rep("M", 5),
country = c("A", "B", "C", "D", "E"),
freq = c(100, 50, 30, 40, 11)
),
data.frame(
year = rep(2017, 4),
protocol = rep("L", 4),
country = c("A", "B", "C", "D"),
freq = c(23, 60, 200, 100)
)
)
a<- ggplot(data = df) +
geom_col(
aes(x = protocol, y = freq,
fill = country, group = country),
position = "stack",
width = 0.4
) +
scale_fill_viridis_d()+
facet_grid(~year)
b <- ggplot(data = df) +
geom_col(
aes(x = as.factor(year) , y = freq,
fill = country, group = country),
position = "stack",
width = 0.4
) +
scale_fill_viridis_d()+
facet_grid(~protocol)
a2<- ggplot(data = df) +
geom_col(
aes(x = protocol, y = freq,
fill = protocol, group = country),
position = "stack",color="red",
width = 0.4
) +
scale_fill_viridis_d()+
facet_grid(~year)
c <- ggplot(data = df) +
geom_stackdodge_col(
aes(x = as.factor(year) , y = freq,
fill = protocol, group = country),
width = 0.1, padding = 0,
) +
scale_fill_viridis_d()
a/a2/c + plot_layout(guide="collect") Created on 2025-02-11 with reprex v2.1.1 Standard output and standard error-- nothing to show -- |
Thanks everyone for the examples. I don't think implementation is the barrier for this issue, but Claus' remark below is:
So the relevant question is whether simulateously stacking + dodging is a core feature or not. One the one hand I think (but am not wholly convinced) that this can be useful in some circumstances. On the other hand, I want to agree with Claus that there is most likely is a better way to display data than stacking and dodging. |
I know we're somewhat offtopic now, but since the question is "is this a core feature" and not "should this be possible at all", I want to point out that stacking more than two categories is almost always bad, because it's usually impossible to actually compare the stacked data values. I discuss this in my book here: https://clauswilke.com/dataviz/visualizing-proportions.html In addition, I'm not a fan of mixing a display of proportions (which you get by stacking) with a display of absolute values (which you get from bars that have different overall heights). It creates additional confusion in the viewer, as the absolute amount of something may increase from one condition to the other while the relative proportion goes down. I generally discourage people from stacking unless they're dealing with a binary variable (male/female, success/failure, etc). |
The "can it be done" and "is it being done" has already been established. The question is whether it should be in ggplot2 or in an extension package. ggplot2 is opinionated and, like with secondary axes, we sometimes steer away from "popular" approaches because they are flawed be design. I'm afraid this also falls into such category which means that it will not end up in ggplot2-proper. However, we made the system extensible for a reason, so that you are not beholden to our pet peeves :-) |
Hi All, Please see my pull-request: #6328 I think that this is a needed feature as shown by the questions collected by @davidhodge931. This was tested only for geom_col, but that seemed to be the most needed anyways. I also think it is more elegant than most of the above solutions. A minimal working example for the above case:
I acknowledge that it doesn't yet work for the labels. |
I'd like to propose adding support for simultaneous stacking and dodging controlled by different variables in geom_col. Currently, this common visualization need requires workarounds that are both verbose and harder to maintain.
Current Limitation
When using geom_col, we can either stack or dodge bars based on a grouping variable, but not both at the same time using different variables. This makes it difficult to create visualizations where we want to:
Here's a reprex with counts from surveillance data stratified by year, country and surveillance protocol
Minimal Reproducible Example
Desired Behavior
Ideally, we would be able to specify both stacking and dodging variables in a single geom_col call, something like:
Use Cases
This functionality would be particularly useful for:
Benefits
The text was updated successfully, but these errors were encountered: