Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“All models failed” with spatial_block_cv() + tune_grid() — “arguments imply differing number of rows: 0, 1” error #165

Open
andrestyle16 opened this issue Dec 26, 2024 · 1 comment

Comments

@andrestyle16
Copy link

Brief description of the problem

I am experiencing an error when using spatial_block_cv() from {spatialsample} together with {tidymodels}' tune_grid() to perform spatial cross-validation on my dataset. The same dataset and modeling approach works fine with a standard vfold_cv(), but fails in all folds with an error message:

Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1 Error in estimate_tune_results(): ! All models failed. Run show_notes(.Last.tune.result) for more information.

I have verified the following:

  1. No empty folds: I checked analysis() and assessment() sets in each fold; they all have >0 rows and include both classes (0 and 1).
  2. No recipe issues: I removed all recipe steps (including step_corr() and step_zv()), or even removed the recipe entirely, and the error persists.
  3. Simple model: I tested a non-tunable rand_forest model with fit_resamples() (i.e., no hyperparameter grid) and still see the same failure.
  4. vfold_cv() works: If I switch from spatial_block_cv() to vfold_cv(), the model + data run successfully through tune_grid() or fit_resamples() with no errors.
  5. Indices look correct: The number of rows in analysis() and assessment() is consistent across folds, each includes humedal == 0 and humedal == 1, so no single-class or 0-row subsets.
  6. Tried reducing folds / radius: For instance, v=3 or radius=50 instead of v=5 and radius=100. The same error arises.
  7. Tried removing geometry: I used a typical approach of reassigning splits with make_splits() to remove geometry from each fold’s analysis/assessment sets, and forced class(...) <- class(folds_spatial). The error persists.

Repro steps and partial code

Below is a simplified version of my workflow:

library(tidymodels)
library(spatialsample)
library(sf)
library(terra)
library(dplyr)
library(purrr)

# Example: I have ~827 data points (presence/absence)
# plus 4 raster-based predictors: ndvi, mndwi, pendiente, ti
# My dataset is an sf object with geometry.

# 1) I create folds:
set.seed(1996)
folds_spatial <- spatial_block_cv(
  data   = my_data_sf,   # ~827 points
  v      = 5,
  radius = 100
)

# 2) Drop geometry in each split:
my_data_nogeo <- st_drop_geometry(my_data_sf)

folds_spatial_nogeo <- folds_spatial %>%
  mutate(
    splits = map(splits, function(s) {
      i_ana <- s$in_id
      i_ass <- s$out_id
      
      rsample::make_splits(
        x    = list(analysis = i_ana, assessment = i_ass),
        data = my_data_nogeo,
        class= "spatial_block_split"
      )
    })
  )
# Restore class
class(folds_spatial_nogeo) <- class(folds_spatial)

# 3) Model specification
rf_spec <- rand_forest(trees = 500) %>%
  set_mode("classification") %>%
  set_engine("ranger", probability = TRUE)

my_wf <- workflow() %>%
  add_model(rf_spec)
  # (Sometimes I add a recipe, or none.)

set.seed(1996)
res <- fit_resamples(
  my_wf,
  resamples = folds_spatial_nogeo
)
# -> Fails with:
# Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1
# All models failed.

I also tried tune_grid() with a small grid of mtry and min_n and got the same result (All models failed).

Observations / Diagnostics

  • If I switch to vfold_cv(my_data_nogeo, v=5, strata=humedal), everything works.
  • The dataset is not huge, but I do have enough rows in each fold (I double-checked with a for loop printing nrow(analysis(...)), nrow(assessment(...)) and the distribution of humedal).
  • Reducing to v=2 or v=3, or changing radius from 100 to 50, did not help.
  • Removing any recipe steps or hyperparameter tuning also did not help.
  • My sessionInfo() is below.

Session Info

# Please see below:
sessionInfo()
# or sessioninfo::session_info()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: America/Santiago
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mapview_2.11.2      spatialsample_0.6.0 vip_0.4.1           DALEX_2.4.3         GGally_2.2.1       
 [6] corrplot_0.95       doParallel_1.0.17   iterators_1.0.14    foreach_1.5.2       ranger_0.17.0      
[11] yardstick_1.3.1     workflowsets_1.1.0  workflows_1.1.4     tune_1.2.1          rsample_1.2.1      
[16] recipes_1.1.0       parsnip_1.2.1       modeldata_1.4.0     infer_1.0.7         dials_1.3.0        
[21] scales_1.3.0        broom_1.0.7         tidymodels_1.2.0    janitor_2.2.1       here_1.0.1         
[26] terra_1.8-5         sf_1.0-19           lubridate_1.9.4     forcats_1.0.0       stringr_1.5.1      
[31] dplyr_1.1.4         purrr_1.0.2         readr_2.1.5         tidyr_1.3.1         tibble_3.2.1       
[36] ggplot2_3.5.1       tidyverse_2.0.0    

loaded via a namespace (and not attached):
 [1] DBI_1.2.3           rlang_1.1.4         magrittr_2.0.3      snakecase_0.11.1    furrr_0.3.1        
 [6] e1071_1.7-16        compiler_4.4.1      png_0.1-8           vctrs_0.6.5         lhs_1.2.0          
[11] fastmap_1.2.0       pkgconfig_2.0.3     backports_1.5.0     leafem_0.2.3        utf8_1.2.4         
[16] prodlim_2024.06.25  tzdb_0.4.0          satellite_1.0.5     xfun_0.49           R6_2.5.1           
[21] stringi_1.8.4       RColorBrewer_1.1-3  parallelly_1.41.0   rpart_4.1.23        Rcpp_1.0.13-1      
[26] knitr_1.49          future.apply_1.11.3 base64enc_0.1-3     Matrix_1.7-0        splines_4.4.1      
[31] nnet_7.3-19         timechange_0.3.0    tidyselect_1.2.1    rstudioapi_0.17.1   timeDate_4041.110  
[36] codetools_0.2-20    listenv_0.9.1       lattice_0.22-6      plyr_1.8.9          withr_3.0.2        
[41] evaluate_1.0.1      future_1.34.0       survival_3.6-4      ggstats_0.7.0       units_0.8-5        
[46] proxy_0.4-27        pillar_1.10.0       rsconnect_1.3.3     KernSmooth_2.23-24  stats4_4.4.1       
[51] generics_0.1.3      sp_2.1-4            rprojroot_2.0.4     hms_1.1.3           munsell_0.5.1      
[56] globals_0.16.3      class_7.3-22        glue_1.8.0          tools_4.4.1         data.table_1.16.4  
[61] gower_1.0.2         grid_4.4.1          crosstalk_1.2.1     ipred_0.9-15        colorspace_2.1-1   
[66] raster_3.6-30       cli_3.6.3           DiceDesign_1.10     lava_1.8.0          gtable_0.3.6       
[71] GPfit_1.0-8         digest_0.6.37       classInt_0.4-10     farver_2.1.2        htmlwidgets_1.6.4  
[76] htmltools_0.5.8.1   leaflet_2.2.2       lifecycle_1.0.4     hardhat_1.4.0       MASS_7.3-60.2

Any guidance would be greatly appreciated!
I suspect either:

  • A subtle bug or mismatch in how spatial_block_cv() interacts with analysis()/assessment() inside tidymodels, or
  • Some unknown configuration in my environment that leads to arguments imply differing number of rows: 0, 1.

Thank you for looking into this!

@simonpcouch
Copy link

Thank you for the issue! I don't have access to your my_data_sf object. Are you able to reproduce this issue with a reprex (reproducible example), using publicly available or simulated data? A reprex will help me troubleshoot and fix your issue more quickly.🙂

That said, this looks like it may live more cozily in spatialsample rather than rules, so I will transfer this issue to that repository. The issues you're seeing may be due to the manual transformations you've labeled step 2).

@simonpcouch simonpcouch transferred this issue from tidymodels/rules Jan 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants