Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Values loaded by NCDatasets backend with "_Unsigned" CF attribute don't correspond to Xarray #735

Open
asinghvi17 opened this issue Sep 11, 2024 · 31 comments
Labels
bug Something isn't working ecosystem

Comments

@asinghvi17
Copy link
Collaborator

MWE:

using FilePathsBase, AWSS3
download(p"s3://noaa-goes16/ABI-L2-SSTF/2020/210/00/OR_ABI-L2-SSTF-M6_G16_s20202100000205_e20202100059513_c20202100105456.nc", "noaa.nc")
using Rasters
r1 = Raster("noaa.nc"; lazy = true)
heatmap(r1)

iTerm2 ZHWLfL

heatmap(read(r1))

iTerm2 LvJSGK

None of this agrees with what xarray loads:
download

but it looks like all CF conventions are applied...

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

Those just look like different colour schemes?

(Otherwise, to be clear on main NCDatasets.jl is still handling CF conversions, not Rasters)

The first one should call read in the Rasters recipe? Does it even hit that?

@asinghvi17
Copy link
Collaborator Author

No the color scheme is the same, the color limits are different though and it seems the numbers are different as well.

@asinghvi17
Copy link
Collaborator Author

Plus you can see the structure is totally strange before read

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

Yes read is clearly a problem. The dispatch is wrong somewhere it looks like broken iteration in DiskArrays.

But the non lazy plots without noise are just NCDatasets.jl vs XArray, Rasters is not affecting the values

@asinghvi17
Copy link
Collaborator Author

aha ok, will check out what's going on there. This is on the cf branch so hopefully it's not ignoring some obscure CF convention...

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

Makes way more sense this is on cf. You gave me a fright that that mess is from main...

It most likely is ignoring or getting some CF thing wrong

@asinghvi17
Copy link
Collaborator Author

FWIW I get this on main too

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

Both problems or just one. Like if it's both Rasters and NCDatasets CF outputs are the same on main and cf, then maybe it's Xarray... Does it even do CF scaling by default

I think cf_xarray is a separate python package, its not in xarray proper?

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

The DiskArrays munging I will look at tonight, seems bad if that's on main

@asinghvi17
Copy link
Collaborator Author

Both unfortunately. Yes it looks like Xarray is doing CF scaling, and the dataset is sea surface temperature, so the Xarray data range makes way more sense than the Rasters/NCDatasets one.

Left is xarray collected into a Julia Matrix, right is rasters.
download-3

@felixcremer

This comment was marked as outdated.

@asinghvi17
Copy link
Collaborator Author

You don't for this, it's public access.

@felixcremer

This comment was marked as outdated.

@asinghvi17
Copy link
Collaborator Author

Huh! Here's a google drive link:

https://drive.google.com/file/d/1Om08vXhgsmLbcGmbEFaachfBHa6Nq5pA/view?usp=sharing

@felixcremer
Copy link
Contributor

I can't reproduce your garbled up lazy raster plotting.
I get the same plot for r1 and read(r1)

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 11, 2024

Which OS/DiskArrays versions are you on? I'm on Mac M1 and DiskArrays 0.4.4

@felixcremer
Copy link
Contributor

I am on linux and DiskArrays 0.3.23

@asinghvi17
Copy link
Collaborator Author

Ah, might be a DiskArrays 0.4.x thing then?

@asinghvi17
Copy link
Collaborator Author

also: this plot illustrates more sharply that the nature of the values is different (some places have different signs or something pre scaling?!) between NCDatasets/Rasters and Xarray.

download-5

@felixcremer
Copy link
Contributor

How did you manage to get DiskArrays 0.4 are you on the PR branch from Fabian?
Can you plot the differences between xarray and Raster?

@felixcremer
Copy link
Contributor

This might be a NCDatasets and DiskArrays 0.4 combination thing, because on the other backends DiskArrays 0.4 didn't produce such problematic plots.

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 11, 2024

Ah possibly, yeah I am on Fabian's PR branch for NCDatasets. This also produces the same issues with strange values with the same data but encoded in Zarr by the source (via Kerchunk). So I don't think the values are an interaction issue unless it's an issue with DiskArrays v0.4 in general.

The read thing might be an issue with the PR though, will report it there.

@asinghvi17
Copy link
Collaborator Author

Couldn't get the wrapped arrays to work together because of a dimension mismatch (Dim{:x} v/s X) but collect saved the day:

download-7

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 11, 2024

There's clearly some structure here that I bet is described by some CF attribute. But I don't know what it is!

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

Haha we have every issue with every problem in branches of DiskArrays/CommonDataModel showing up here!

Can we keep issues here to registered versions of all deps? I will go insane otherwise

@felixcremer
Copy link
Contributor

I can reproduce on a clean env with registerd versions the values that Anshul reported above for opening the data with Raster but I haven't compared to xarray locally. So this issue should rather be about the not applied CF conventions and not about the read, because this might be a DiskArrays 0.4 issue.
In the metadata there is the add_offset of 180 which could be the extrema of your differences.
But I looked at the underlying data that I get from NCDatasets and the applied offset and scale_factor give the same value as I get from the Raster.

@asinghvi17 asinghvi17 changed the title Plotting a lazy raster seems broken? Values loaded by NCDatasets backend don't correspond to Xarray Sep 11, 2024
@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 11, 2024

Does Rasters check the valid_range CF attribute at all?

@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

On main Rasters checks nothing (It's CommonDataModel), in cf I don't think it checks valid_range

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Sep 11, 2024

It turns out there's an attribute _Unsigned in CF which controls whether the type is signed or unsigned?!! No idea why they don't just encode it in the type like normal people

Turns out that was the issue. Might be worth a CommonDataModel.jl issue. But this means that all fillvalues etc need this horribly complicated reinterpretation to unsigned values too. This sounds like a lot of work...but it's a major correctness bug especially on old datasets

@asinghvi17 asinghvi17 changed the title Values loaded by NCDatasets backend don't correspond to Xarray Values loaded by NCDatasets backend with "_Unsigned" CF attribute don't correspond to Xarray Sep 11, 2024
@rafaqz
Copy link
Owner

rafaqz commented Sep 11, 2024

Absolutely worth a CommonDataModel.jl bug report

Probably actually NCDatasets.jl. Rasters should not know about types issues like that the variable should be properly typed

@asinghvi17
Copy link
Collaborator Author

asinghvi17 commented Nov 7, 2024

The DiskArrays portion of this issue (striping in the plots) was fixed by JuliaIO/DiskArrays.jl#198. The unsigned attribute likely needs to be handled in NCDatasets.jl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ecosystem
Projects
None yet
Development

No branches or pull requests

3 participants