Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't conveniently open zip store from path with zarr v3 #2831

Open
aulemahal opened this issue Feb 13, 2025 · 1 comment · May be fixed by #2856
Open

Can't conveniently open zip store from path with zarr v3 #2831

aulemahal opened this issue Feb 13, 2025 · 1 comment · May be fixed by #2856
Labels
bug Potential issues with the zarr-python library

Comments

@aulemahal
Copy link

aulemahal commented Feb 13, 2025

Zarr version

v3

Numcodecs version

v0.15.1

Python Version

3.12

Operating System

Linux

Installation

Mamba

Description

In zarr v2, passing a path (str) ending with *.zip to zarr.open or zarr.open_consolidated would work transparently, the same way as is a *.zarr path was given.

In zarr v3, this fails with FileNotFoundError : Unable to find group. The way around is to open a ZipStore explicitely and then pass that to zarr.open.

The zarr.open docstring says:

Parameters
----------
store : Store or str, optional
Store or path to directory in file system or name of zip file.

which seems to imply that passing a string path ending in zip should work.

Moreover, I found it more convenient when one didn't need to test for file extensions and explicitly handle storage objects. I use zarr dataset through xarray and it seems to me that xr.open_dataset('example.zarr.zip', engine='zarr') should usually be how a normal user should open a such a file ?

I understand that the ZipStore is still « experimental » in v3, and I really hope you keep it in the officiel scheme because it is very useful, to us at least. Zipped zarrs have many of the benefits of zarr (over netCDF for example), but without the inode-explosion that pure zarr folders create on unix filesystems (slowing down the disk operations).

I think I see that the store guessing happens in zarr.storage._common.make_store_path ?
Like it could happen here:

elif isinstance(store_like, Path):
store = await LocalStore.open(root=store_like, read_only=_read_only)
elif isinstance(store_like, str):
storage_options = storage_options or {}
if _is_fsspec_uri(store_like):
used_storage_options = True
store = FsspecStore.from_url(
store_like, storage_options=storage_options, read_only=_read_only
)
else:
store = await LocalStore.open(root=Path(store_like), read_only=_read_only)

Would this convenience be welcomed back in zarr-python ? I could do a PR if the team here agrees with adding this case handling. To avoid pure string checking, one could even use zipfile.is_zipfile from the standard library to check for zip stores ?

Otherwise, I guess this could be done by xarray itself ? Many of my scripts go through intake-esm also, I guess we could fix it there too if the proposal gets refused here.

Steps to reproduce

Example adapted from the doc.

import numpy as np
import zarr

store = zarr.storage.ZipStore("example-3.zip", mode='w')
z = zarr.create_array(
    store=store,
    shape=(100, 100),
    chunks=(10, 10),
    dtype="f4"
)
z[:, :] = np.random.random((100, 100))
store.close()

zarr.open('example-3.zip', mode='r')

Additional output

No response

@aulemahal aulemahal added the bug Potential issues with the zarr-python library label Feb 13, 2025
@d-v-b
Copy link
Contributor

d-v-b commented Feb 14, 2025

Would this convenience be welcomed back in zarr-python ? I could do a PR if the team here agrees with adding this case handling. To avoid pure string checking, one could even use zipfile.is_zipfile from the standard library to check for zip stores ?

We absolutely want this, and a PR that routes paths ending in .zip to ZipStore would be appreciated.

Ultimately I think all the stores should be url-addressible -- for every store, users should be able to provide a string that tells zarr how to open the store. But making this robust requires setting up a registry that maps storage protocols onto store classes, and nobody has started working on this yet.

@aulemahal aulemahal linked a pull request Feb 21, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants