Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request option not to warn when using cheap (or even test distances) #34

Open
daniellemccool opened this issue Apr 27, 2021 · 2 comments

Comments

@daniellemccool
Copy link
Contributor

I use geodist to test whether or not two points are within a very small range of each other (80 - 200 meters) while identifying stops.

For these purposes, cheap is just fine, so I'd prefer to use it. Unfortunately, not all the points that are tested are close together. What this usually means is despite the fact that any of the measures that ARE above 100km are uninteresting to me anyway, I have to deal with a wall of warnings and, presumably, some efficiency loss because it's being tested every time.

It's a minor annoyance, but maybe there are others that feel this way?

@mpadge
Copy link
Member

mpadge commented Apr 28, 2021

That's a good idea. The good news in the meantime is that there really should be no loss of efficiency - the max dist is calculated as cheaply as possibly, and only one time from the entire input data. You can also use suppressWarnings to, well, do just that. This code demonstrates both of these points:

library (geodist)
f <- function (n = 1000, range = 1.0) {
    xy <- data.frame (x = runif (n, -range, range),
                      y = runif (n, -range, range))
    suppressWarnings (
        d <- geodist (xy)
        )
}
bench::mark (d10 <- f (range = 1.0),
             d01 <- f (range = 0.1),
             check = FALSE)
#> # A tibble: 2 x 6
#>   expression                 min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>            <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 d10 <- f(range = 1)        6ms   9.55ms      93.5        NA     158.
#> 2 d01 <- f(range = 0.1)   5.75ms   7.43ms     119.         NA     112.

Created on 2021-04-28 by the reprex package (v2.0.0.9000)

That said, i still think this would be a useful enhancement. Would you like to submit a pull request? You'd just need to introduce an extra parameter, which would by convention be called quiet = FALSE, to the main functions, and then the 2 places (in geodist.R and geodist-vec.R) where check_max_d() is called, replace lines like these:

geodist/R/geodist.R

Lines 82 to 83 in fab7e24

if (measure == "cheap")
check_max_d (res, measure)

With something like:

if (measure == "cheap" & quiet)
    check_max_d (res, measure)

Then we'd just need to explain in the documentation that setting quiet = TRUE might yield inaccurate distances with default cheap.

@daniellemccool
Copy link
Contributor Author

Sure, I'd be happy to!

Regarding the efficiency, it may well not be much of a loss, but it's hard to overstate the sheer number of times I'm calling geodist_vec. Ultimately I've had difficulties developing an algorithm where I can efficiently test the distances between the necessary pairs of points (up to 64k) without overshooting available memory. So as it stands, rather than calling geodist once, it often gets called thousands to tens of thousands of times on a single data set as I work through to find the ends of stops. (And then I do that 200 times in each of 580 data sets.)

Frankly, geodist is incredibly performant. It's far from the most imposing bottleneck (that's joins), which is why I haven't considered creating my own calculation in c with just the parts I need. There's very little overhead.

I'll submit a pull request with the change. Thanks for always being so responsive!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants