-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request option not to warn when using cheap (or even test distances) #34
Comments
That's a good idea. The good news in the meantime is that there really should be no loss of efficiency - the max dist is calculated as cheaply as possibly, and only one time from the entire input data. You can also use library (geodist)
f <- function (n = 1000, range = 1.0) {
xy <- data.frame (x = runif (n, -range, range),
y = runif (n, -range, range))
suppressWarnings (
d <- geodist (xy)
)
}
bench::mark (d10 <- f (range = 1.0),
d01 <- f (range = 0.1),
check = FALSE)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 d10 <- f(range = 1) 6ms 9.55ms 93.5 NA 158.
#> 2 d01 <- f(range = 0.1) 5.75ms 7.43ms 119. NA 112. Created on 2021-04-28 by the reprex package (v2.0.0.9000) That said, i still think this would be a useful enhancement. Would you like to submit a pull request? You'd just need to introduce an extra parameter, which would by convention be called Lines 82 to 83 in fab7e24
With something like: if (measure == "cheap" & quiet)
check_max_d (res, measure) Then we'd just need to explain in the documentation that setting |
Sure, I'd be happy to! Regarding the efficiency, it may well not be much of a loss, but it's hard to overstate the sheer number of times I'm calling geodist_vec. Ultimately I've had difficulties developing an algorithm where I can efficiently test the distances between the necessary pairs of points (up to 64k) without overshooting available memory. So as it stands, rather than calling geodist once, it often gets called thousands to tens of thousands of times on a single data set as I work through to find the ends of stops. (And then I do that 200 times in each of 580 data sets.) Frankly, geodist is incredibly performant. It's far from the most imposing bottleneck (that's joins), which is why I haven't considered creating my own calculation in c with just the parts I need. There's very little overhead. I'll submit a pull request with the change. Thanks for always being so responsive! |
I use geodist to test whether or not two points are within a very small range of each other (80 - 200 meters) while identifying stops.
For these purposes, cheap is just fine, so I'd prefer to use it. Unfortunately, not all the points that are tested are close together. What this usually means is despite the fact that any of the measures that ARE above 100km are uninteresting to me anyway, I have to deal with a wall of warnings and, presumably, some efficiency loss because it's being tested every time.
It's a minor annoyance, but maybe there are others that feel this way?
The text was updated successfully, but these errors were encountered: