Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

summarize should check if geometry is used #4

Open
kylebutts opened this issue Dec 6, 2023 · 0 comments
Open

summarize should check if geometry is used #4

kylebutts opened this issue Dec 6, 2023 · 0 comments

Comments

@kylebutts
Copy link

I know this is a proof-of-concept, so feel free to ignore. Say you have data with the same geometry across the group. In sf, I would do |> summarize(geometry = geometry[1]) to avoid the costly operation. This doesn't work in sdf currently (see below). Also, the new .by syntax causes problems FYI

# %%
library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.3.1
#> Warning: package 'stringr' was built under R version 4.3.1
library(sf) 
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(sdf)
library(tictoc)

data(guerry, package = "sfdep")
g <- dplyr::select(guerry, code_dept, crime_pers, region)

# %% `sf`
tic()
g |>
  summarize(
    mean_crime_pers = mean(crime_pers, na.rm = TRUE),
    .by = region
  )
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 47680 ymin: 1703258 xmax: 1031401 ymax: 2677441
#> CRS:           NA
#> # A tibble: 5 × 3
#>   region mean_crime_pers                                                geometry
#>   <fct>            <dbl>                                          <MULTIPOLYGON>
#> 1 E               20119. (((381847 1762775, 381116 1763059, 379972 1762874, 378…
#> 2 N               22592. (((381847 1762775, 381116 1763059, 379972 1762874, 378…
#> 3 C               22654. (((381847 1762775, 381116 1763059, 379972 1762874, 378…
#> 4 S               11954. (((381847 1762775, 381116 1763059, 379972 1762874, 378…
#> 5 W               22485. (((381847 1762775, 381116 1763059, 379972 1762874, 378…
toc()
#> 0.23 sec elapsed

tic()
g |>
  summarize(
    mean_crime_pers = mean(crime_pers, na.rm = TRUE),
    g = geometry[1],
    .by = region
  )
#> Simple feature collection with 5 features and 2 fields
#> Geometry type: MULTIPOLYGON
#> Dimension:     XY
#> Bounding box:  xmin: 382200 ymin: 1920016 xmax: 894622 ymax: 2564568
#> CRS:           NA
#> # A tibble: 5 × 3
#>   region mean_crime_pers                                                       g
#>   <fct>            <dbl>                                          <MULTIPOLYGON>
#> 1 E               20119. (((801150 2092615, 800669 2093190, 800688 2095430, 800…
#> 2 N               22592. (((729326 2521619, 729320 2521230, 729280 2518544, 728…
#> 3 C               22654. (((710830 2137350, 711746 2136617, 712430 2135212, 712…
#> 4 S               11954. (((747008 1925789, 746630 1925762, 745723 1925138, 744…
#> 5 W               22485. (((456425 2120055, 456229 2120382, 455943 2121064, 456…
toc()
#> 0.015 sec elapsed

# %% `sdf`
g_sdf <- as_sdf(g)

tic()
g_sdf |>
  group_by(region) |>
  summarize(
    mean_crime_pers = mean(crime_pers, na.rm = TRUE)
  )
#> Spatial Data Frame
#> Geometry Type: sfc_MULTIPOLYGON
#> Bounding box: xmin: 47680 ymin: 1703258 xmax: 1031401 ymax: 2677441
#> # A tibble: 5 × 3
#>   region mean_crime_pers                                                geometry
#>   <fct>            <dbl>                                          <MULTIPOLYGON>
#> 1 C               22654. (((710830 2137350, 711746 2136617, 712430 2135212, 712…
#> 2 E               20119. (((801150 2092615, 800669 2093190, 800688 2095430, 800…
#> 3 N               22592. (((729326 2521619, 729320 2521230, 729280 2518544, 728…
#> 4 S               11954. (((747008 1925789, 746630 1925762, 745723 1925138, 744…
#> 5 W               22485. (((456425 2120055, 456229 2120382, 455943 2121064, 456…
toc()
#> 0.139 sec elapsed

tic()
g_sdf |>
  group_by(region) |>
  summarize(
    mean_crime_pers = mean(crime_pers, na.rm = TRUE),
    g = geometry[1]
  )
#> Error:
#> ! The `j` argument of `[[.tbl_df()` can't be a vector of length 2 as of
#>   tibble 3.0.0.
#> ℹ Recursive subsetting is deprecated for tibbles.
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::summarize(...)
#>   2. └─sdf::summarise.sdf(...)
#>   3.   └─sdf::as_sdf(res)
#>   4.     ├─tibble::new_tibble(...)
#>   5.     │ └─rlang::pairlist2(...)
#>   6.     ├─sdf:::validate_bbox(bounding_box(x[[geom_col]]))
#>   7.     │ └─base::stopifnot(...)
#>   8.     ├─sdf::bounding_box(x[[geom_col]])
#>   9.     ├─x[[geom_col]]
#>  10.     └─tibble:::`[[.tbl_df`(x, geom_col)
#>  11.       └─tibble:::tbl_subset2(x, j = i, j_arg = substitute(i))
#>  12.         └─lifecycle::deprecate_stop(...)
#>  13.           └─lifecycle:::deprecate_stop0(msg)
#>  14.             └─rlang::cnd_signal(...)
toc()
#> 0.026 sec elapsed

Created on 2023-12-06 with reprex v2.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant