Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to zarr 3 and main kerchunk #406

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

jsignell
Copy link
Contributor

@jsignell jsignell commented Jan 30, 2025

Pulls commits from #378, #405 and #393

UPDATE: Ideas for #376 removed in 42414f0

  • Tests added
  • Tests passing
  • Full type hint coverage
  • Changes are documented in docs/releases.rst
  • New functions/methods are listed in api.rst
  • New functionality has documentation

The remaining failures on upstream can be broken into:

  • codecs-related:
    FAILED virtualizarr/tests/test_codecs.py::TestCodecs::test_manifest_array_zarr_v2_normalized - AssertionError: assert (BytesCodec(e...shuffle': 1})) == (Delta(codec_...shuffle': 1}))
    FAILED virtualizarr/tests/test_writers/test_icechunk.py::test_write_loadable_variable - TypeError: Group.create_array() got an unexpected keyword argument 'exists_ok'
    FAILED virtualizarr/tests/test_writers/test_zarr.py::test_zarr_v3_metadata_conformance - AssertionError: assert (True and 1 > 1)
  • probably setting up the in-memory icechunk store wrong - these are new tests that have never worked - removed in 42414f0
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_roundtrip_no_concat[HDFVirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.zarr3.FixedScaleOffset'> instead.
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_kerchunk_roundtrip_concat[False-time_vars0-HDFVirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Expected a BytesBytesCodec. Got <class 'numcodecs.zarr3.FixedScaleOffset'> instead.
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_kerchunk_roundtrip_concat[True-time_vars1-HDF5VirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Group.create_array() got an unexpected keyword argument 'exists_ok'
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_kerchunk_roundtrip_concat[True-time_vars1-HDFVirtualBackend-roundtrip_as_in_memory_icechunk] - TypeError: Group.create_array() got an unexpected keyword argument 'exists_ok'
    FAILED virtualizarr/tests/test_integration.py::TestRoundtrip::test_datetime64_dtype_fill_value[roundtrip_as_in_memory_icechunk] - KeyError: '<M8[ns]'

@TomNicholas
Copy link
Member

probably setting up the in-memory icechunk store wrong - these are new tests that have never worked

I don't know that this PR can be disentangled from the changes we need to get icechunk working with zarr v3...

This was referenced Jan 30, 2025
@jsignell
Copy link
Contributor Author

I don't know that this PR can be disentangled from the changes we need to get icechunk working with zarr v3...

I can take the new in-memory icechunk stuff and leave the place where it could slot in. Or this can just be used as a reference point. I'm not attached 😄

@TomNicholas TomNicholas added zarr-python Relevant to zarr-python upstream Kerchunk Relating to the kerchunk library / specification itself dependencies Updates a dependency Icechunk 🧊 Relates to Icechunk library / spec v3-migration Required for migration to Zarr-Python 3.0 labels Jan 30, 2025
@TomNicholas
Copy link
Member

So IIUC we can either:

  • Merge this now, break main but only for upstream, then fix the remaining failures in follow-up PRs. That's kind of what Planning explicit dependency on Zarr v3 #392 (comment) implied by separating the "pin zarr" and "fix results of pinning zarr" into separate steps.
  • Just try to fix the codec issues in this PR

@maxrjones
Copy link
Member

So IIUC we can either:

  • Merge this now, break main but only for upstream, then fix the remaining failures in follow-up PRs. That's kind of what Planning explicit dependency on Zarr v3 #392 (comment) implied by separating the "pin zarr" and "fix results of pinning zarr" into separate steps.
  • Just try to fix the codec issues in this PR

I'd slightly prefer if we merged this now because it seems to me like this PR really breaks the stalemate that we were in and would unblock other work. specifically for my work, I'd love to merge it into #407 to get the updates already implemented

@@ -39,7 +39,7 @@ hdf_reader = [
"numcodecs"
]
icechunk = [
"icechunk==0.1.0a8",
"icechunk>=0.1.0a12",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsignell what's the motivation for using a12 rather than a15?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw this comment: #378 (comment) and so I thought a12 would be ok. And I looked at the codebase for the latest tagged version of icechunk and I saw this line which made me nervous about using newer than a12

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks for sharing!

@TomNicholas
Copy link
Member

I'd slightly prefer if we merged this now

I'm happy to merge this now if @jsignell is. We want to block everyone else as little as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Updates a dependency Icechunk 🧊 Relates to Icechunk library / spec Kerchunk Relating to the kerchunk library / specification itself v3-migration Required for migration to Zarr-Python 3.0 zarr-python Relevant to zarr-python upstream
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants