Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

floating-point issues when extracting the edge of a grid cell #876

Open
tiemvanderdeure opened this issue Jan 27, 2025 · 12 comments
Open

floating-point issues when extracting the edge of a grid cell #876

tiemvanderdeure opened this issue Jan 27, 2025 · 12 comments
Labels
bug Something isn't working

Comments

@tiemvanderdeure
Copy link
Collaborator

In the example below, we extracting from a raster defined by its edges, and in half the cases we don't get the correct cell. In lots of cases we also get missing, sometimes when extracting from the middle of the Raster!

using Rasters, Rasters.Lookups
xdim = 0:0.0416666665:1
ydim = 5:-0.0416666665:-5
ds = (X(xdim; sampling = Intervals(Start())), Y(ydim; sampling = Intervals(Start())))
ras = Raster(DimPoints(ds); dims = ds)
extr = extract(ras, DimPoints(ds), skipmissing = false)
extract_itself = getfield.(extr, 1) .== getfield.(extr, 2)
count(!, skipmissing(extract_itself)) # 3631
count(ismissing, extract_itself) # 150
length(extract_itself) # 6025

I encountered this in a case where I Rasters.sample some points from one raster and then extract them from another with identical dimensions. Which probably isn't too crazy of a use case.

I did a little bit of digging and what is going on here is that the Raster dimensions get converted from a range to a vector of values by _prepare_for_burning, and that allows for floating point issues to become a thing.

@tiemvanderdeure tiemvanderdeure added the bug Something isn't working label Jan 27, 2025
@rafaqz
Copy link
Owner

rafaqz commented Jan 27, 2025

Ahh damn. Does it work if we don't convert to vectors?

@tiemvanderdeure
Copy link
Collaborator Author

I tried just taking _prepare_for_burning out that fixes this issue and tests pass. Why exactly is it needed? It is for performance?

@rafaqz
Copy link
Owner

rafaqz commented Jan 27, 2025

Yeah vectors are much faster. But I think there are other things in that method we need to keep, we can just remove the collector part

@tiemvanderdeure
Copy link
Collaborator Author

Probably shiftlocus would also give issues - how badly do we need that? Otherwise we can keep it for polygons but not for points?

@rafaqz
Copy link
Owner

rafaqz commented Jan 27, 2025

Shiflocus is a no-op for points. But for intervals it would be a lot of work to handle rasterizing all variants

@asinghvi17
Copy link
Collaborator

asinghvi17 commented Jan 27, 2025

Hmm...wouldn't you just need fast_index(locus, lookup, val)? After that, the grid is the grid is the grid...unless I missed some portion of the logic there.

@rafaqz
Copy link
Owner

rafaqz commented Jan 27, 2025

Not sure what you mean there... Rasterization uses y values to make edges, x values during scans. At some point you need to check when you cross them. It also uses them in bresenham.

So there are multiple points you just have to get values out of twice precision ranges and put them in something faster to access.

We could go and write all of that to handle all combinations. But then you need tests for them all.

One day we can do exact arithmetic on them I guess... But it's very far from my top priority

@tiemvanderdeure
Copy link
Collaborator Author

Rasterization uses y values to make edges, x values during scans.

But this is only for lines and polygons, not for points, right?

In practice this whole issue would only really comes up if the points you are extracting are somehow derived from DimPoints and so for lines or polygons it doesn't really matter. Maybe we could call _prepare_for_burning further down, so we never have to call it if for point extraction

@tiemvanderdeure
Copy link
Collaborator Author

I just checked what rasterize does (it does give correct results for the example above). Here it first converts to an array, but then only uses the length and the extent, so it basically treats it as a range anyway later on.

@rafaqz
Copy link
Owner

rafaqz commented Jan 28, 2025

I guess the DimPoints are Start locus? That is likely the main problem. Very prone to switching which interval a point is in.

Probably we do need a separate path for points.

@tiemvanderdeure
Copy link
Collaborator Author

I guess the DimPoints are Start locus? That is likely the main problem. Very prone to switching which interval a point is in.

It is, but Contains already handles that, I think, and extract uses Contains under the hood. So I think we can fix this just by leaving the dimensions untouched when doing point extraction (maybe even point rasterization as well?)

@rafaqz
Copy link
Owner

rafaqz commented Jan 30, 2025

I mean Center won't have this problem because DimPoints values are not near interval edges.

But yes, we need to go through and specialise on Start/Center/End everywhere and not convert.

We also need to not add steps to make intervals, but always use the two adjacent lookup values and bounds.

Then we need to tests it all. I don't see this being a super fun time for anyone, but does need to happen at some stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants