You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am experiencing an error when using spatial_block_cv() from {spatialsample} together with {tidymodels}' tune_grid() to perform spatial cross-validation on my dataset. The same dataset and modeling approach works fine with a standard vfold_cv(), but fails in all folds with an error message:
Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1 Error in estimate_tune_results(): ! All models failed. Run show_notes(.Last.tune.result) for more information.
I have verified the following:
No empty folds: I checked analysis() and assessment() sets in each fold; they all have >0 rows and include both classes (0 and 1).
No recipe issues: I removed all recipe steps (including step_corr() and step_zv()), or even removed the recipe entirely, and the error persists.
Simple model: I tested a non-tunable rand_forest model with fit_resamples() (i.e., no hyperparameter grid) and still see the same failure.
vfold_cv() works: If I switch from spatial_block_cv() to vfold_cv(), the model + data run successfully through tune_grid() or fit_resamples() with no errors.
Indices look correct: The number of rows in analysis() and assessment() is consistent across folds, each includes humedal == 0 and humedal == 1, so no single-class or 0-row subsets.
Tried reducing folds / radius: For instance, v=3 or radius=50 instead of v=5 and radius=100. The same error arises.
Tried removing geometry: I used a typical approach of reassigning splits with make_splits() to remove geometry from each fold’s analysis/assessment sets, and forced class(...) <- class(folds_spatial). The error persists.
Repro steps and partial code
Below is a simplified version of my workflow:
library(tidymodels)
library(spatialsample)
library(sf)
library(terra)
library(dplyr)
library(purrr)
# Example: I have ~827 data points (presence/absence)# plus 4 raster-based predictors: ndvi, mndwi, pendiente, ti# My dataset is an sf object with geometry.# 1) I create folds:
set.seed(1996)
folds_spatial<- spatial_block_cv(
data=my_data_sf, # ~827 pointsv=5,
radius=100
)
# 2) Drop geometry in each split:my_data_nogeo<- st_drop_geometry(my_data_sf)
folds_spatial_nogeo<-folds_spatial %>%
mutate(
splits= map(splits, function(s) {
i_ana<-s$in_idi_ass<-s$out_idrsample::make_splits(
x=list(analysis=i_ana, assessment=i_ass),
data=my_data_nogeo,
class="spatial_block_split"
)
})
)
# Restore class
class(folds_spatial_nogeo) <- class(folds_spatial)
# 3) Model specificationrf_spec<- rand_forest(trees=500) %>%
set_mode("classification") %>%
set_engine("ranger", probability=TRUE)
my_wf<- workflow() %>%
add_model(rf_spec)
# (Sometimes I add a recipe, or none.)
set.seed(1996)
res<- fit_resamples(
my_wf,
resamples=folds_spatial_nogeo
)
# -> Fails with:# Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 0, 1# All models failed.
I also tried tune_grid() with a small grid of mtry and min_n and got the same result (All models failed).
Observations / Diagnostics
If I switch to vfold_cv(my_data_nogeo, v=5, strata=humedal), everything works.
The dataset is not huge, but I do have enough rows in each fold (I double-checked with a for loop printing nrow(analysis(...)), nrow(assessment(...)) and the distribution of humedal).
Reducing to v=2 or v=3, or changing radius from 100 to 50, did not help.
Removing any recipe steps or hyperparameter tuning also did not help.
My sessionInfo() is below.
Session Info
# Please see below:
sessionInfo()
# or sessioninfo::session_info()
Thank you for the issue! I don't have access to your my_data_sf object. Are you able to reproduce this issue with a reprex (reproducible example), using publicly available or simulated data? A reprex will help me troubleshoot and fix your issue more quickly.🙂
That said, this looks like it may live more cozily in spatialsample rather than rules, so I will transfer this issue to that repository. The issues you're seeing may be due to the manual transformations you've labeled step 2).
Brief description of the problem
I am experiencing an error when using spatial_block_cv() from {spatialsample} together with {tidymodels}' tune_grid() to perform spatial cross-validation on my dataset. The same dataset and modeling approach works fine with a standard
vfold_cv()
, but fails in all folds with an error message:Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1 Error in estimate_tune_results(): ! All models failed. Run show_notes(.Last.tune.result) for more information.
I have verified the following:
analysis()
andassessment()
sets in each fold; they all have >0 rows and include both classes (0 and 1).step_corr()
andstep_zv()
), or even removed the recipe entirely, and the error persists.rand_forest
model withfit_resamples()
(i.e., no hyperparameter grid) and still see the same failure.spatial_block_cv()
tovfold_cv()
, the model + data run successfully throughtune_grid()
orfit_resamples()
with no errors.analysis()
andassessment()
is consistent across folds, each includes humedal == 0 and humedal == 1, so no single-class or 0-row subsets.v=3
orradius=50
instead ofv=5
andradius=100
. The same error arises.make_splits()
to remove geometry from each fold’s analysis/assessment sets, and forcedclass(...) <- class(folds_spatial)
. The error persists.Repro steps and partial code
Below is a simplified version of my workflow:
I also tried tune_grid() with a small grid of mtry and min_n and got the same result (All models failed).
Observations / Diagnostics
Session Info
Any guidance would be greatly appreciated!
I suspect either:
Thank you for looking into this!
The text was updated successfully, but these errors were encountered: