You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some very useful functions that deal with the missing value in the package.
I wonder there is a chance to develop a function like SAS nmiss or cmiss that can
count the missing value.
I had a draft function below referring to the dplyr::coalesce, but it is better that there is a more useful and robust function from your talent developer.
# Count number of missing value
library(dplyr, warn.conflicts=FALSE)
cmiss<-function(..., .blanks_to_na=TRUE) {
args<-rlang::list2(...)
if (length(args) ==0L) {
rlang::abort("`...` can't be empty.")
}
args<-vctrs::vec_recycle_common(!!!args)
stopifnot(length(.blanks_to_na) ==1, is.logical(.blanks_to_na))
if (.blanks_to_na) {
args<-purrr::map_if(args, is.character, ~dplyr::na_if(.x, ""))
}
purrr::pmap_int(purrr::map(args, is.na), sum)
}
a<- c(1, 2, NA)
b<- c(3, NA, 4)
c<-"c"d<- c("NA", "", NA)
# treat "" as `NA` by default
cmiss(a, b, c, d)
#> [1] 0 2 2
cmiss(a, b, c, d, .blanks_to_na=FALSE)
#> [1] 0 1 2df<-data.frame(v1= c("a", NA, "b", NA, NA),
v2= c(NA, "c", "d", NA, NA),
v3= c(letters[5:8], NA),
v4= rep(NA, 5))
df %>%
mutate(n_miss= cmiss(v1, v2, v3, v4),
first_non_missing= coalesce(v1, v2, v3, v4))
#> v1 v2 v3 v4 n_miss first_non_missing#> 1 a <NA> e NA 2 a#> 2 <NA> c f NA 2 c#> 3 b d g NA 1 b#> 4 <NA> <NA> h NA 3 h#> 5 <NA> <NA> <NA> NA 4 <NA>
I think this can actually be done quite elegantly with across() + rowSums()
library(dplyr, warn.conflicts=FALSE)
blanks_to_na<-function(x) {
dplyr::na_if(x, "")
}
df<-data.frame(v1= c("a", NA, "b", NA, NA),
v2= c(NA, "c", "d", "", NA),
v3= c(letters[5:8], NA),
v4= rep(NA, 5)
)
df %>%
dplyr::mutate(across(c(v1, v2, v3), blanks_to_na)) |>dplyr::mutate(n_miss= rowSums(across(c(v1, v2, v3, v4), is.na)))
#> v1 v2 v3 v4 n_miss#> 1 a <NA> e NA 2#> 2 <NA> c f NA 2#> 3 b d g NA 1#> 4 <NA> <NA> h NA 3#> 5 <NA> <NA> <NA> NA 4
This also has the benefit of letting you use tidyselect in the across() call to select the columns.
It's not a bad idea to have a psummissing() style function like cmiss(), but I think it is a little too niche for tidyr! Sounds nice for a separate package though!
Dear developer,
There are some very useful functions that deal with the missing value in the package.
I wonder there is a chance to develop a function like SAS
nmiss
orcmiss
that cancount the missing value.
I had a draft function below referring to the
dplyr::coalesce
, but it is better that there is a more useful and robust function from your talent developer.Created on 2024-02-24 with reprex v2.1.0
Thank you very much!
The text was updated successfully, but these errors were encountered: