-
-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature(group): add group setitem api #2393
Conversation
Are you aiming to get this merged before 3.0 @jhamman? And if not, what is the recommended way to do this with the current beta? My naive attempt in napari/napari#7215 (here to translate: grp[key] = arr is: grp.create_array(name=key, shape=arr.shape, dtype=arr.dtype, data=arr) (which is a little clunky, maybe shape and dtype should be optional if data is provided?) |
putting aside the value of v2 compatibility, I'm not a fan of this API for a few reasons:
for the first point, we could scope all the hierarchy mutation stuff to a
@jni I agree that this is clunky, and it's silly to expose |
Regardless of Citizens United 😂, I don't think attributes have human rights @d-v-b, so I don't think "unfair" is really valid here. Which leaves "arbitrary". I think generally in hierarchical APIs like these it's very common for attributes to be under a .attrs, um, attribute? I think "commonly known API" is not arbitrary, but being a nice player in an ecosystem. As to defaults, I am a big fan of them, let's have more of those. (See also #2083 😉) Either way, I am looking for guidance as to what I should do in napari for now, and also, that guidance should probably go into #2102. 🙏 |
I'm not aware of the hierarchical API ecosystem; I'm coming from a more basic "how do we model the data" perspective, and from that POV, arbitrary symmetry breaking is almost always a mistake. For example, if you had to make a JSON serializable model of a zarr group, you have three options. (Recall that a zarr group is a dict of attributes and a dict of arrays / groups)
We could of course ignore this in our |
ok, fair. ;) |
(but, sometimes, deference to the past can be important. I won't make that judgement call here. But hdf5, and (consequently?) NetCDF, both use the .attrs-style API, if I'm not mistaken. And zarr v2??) |
and I would argue it's a mistake in all three cases. FWIW I didn't think about this much until I did |
My opinion here is that we should include this in 3.0 but consider the following options:
@d-v-b - how would this API feel to you? |
This is definitely an improvement, and I don't think I'm opposed to it. I'm most excited about the deprecation part :) Should we put default array codec settings + chunking in the config, and use that for invocations like |
src/zarr/core/group.py
Outdated
def __iter__(self) -> Generator[tuple[str, Array], None]: | ||
for name, async_array in self._group._sync_iter(self._group._async_group.arrays()): | ||
yield name, Array(async_array) | ||
|
||
def __call__(self) -> Generator[tuple[str, Array], None]: | ||
return iter(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could use some opinions here!
On one hand, iter(group.arrays)
is nice and tidy. But it also doesn't match the existing API (group.arrays()
) and there are other methods where symetry seems useful (array_keys
, array_values
, etc.).
@d-v-b - you'll have an opinion here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what breaks if we remove __call__
? I feel like that's redundant if you can just iterate. And then we could add keys
and values
methods to the ArraysProxy
object that lossily call __iter__
, thereby replacing array_keys
and array_values
?
We end up with a kind of a weird data structure -- something that's asymptotically a dict? But I think that also matches the semantics of "lots of stuff stored on disk with unique names".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what breaks if we remove call
lots of stuff that looks like this (from our test suite):
> assert sorted(group.arrays(), key=lambda x: x[0]) == expected_arrays
E TypeError: 'ArraysProxy' object is not callable
Looking at 2.18, I'm realizing we also are missing the recurse
option to Group.arrays(recurse: bool = False)
...
All of this makes me think it may be asking too much to overload .arrays
with setiem/getitem/iter/etc.
I just ran across this blast from the past: #508 |
This is ready for a review. I've got taken the least invasive route. I think we may want to deprecate this but I see no harm in keeping it now. We can debate its removal after the 3.0 release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jhamman! Personally I quite like soft deprecations as a way to gently move a community towards other APIs.
@d-v-b - I'm planning to merge this in its current state. If you would like to come behind with a set of deprecations and/or features that add a separate namespace for |
I think #251 was fixed in 2.x at some point but it this PR also makes it work with v3.
closes #251
closes #508
TODO: