-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR adding support for Zarr V3 #3068
base: main
Are you sure you want to change the base?
Conversation
can work with it. I will revise this PR description later. ### Notes: 1. It does not yet run on github actions without errors. 2. It appears to have no memory leaks 3. It has problems with the testing infrastructure that I am working on. 4. CMake does not work for testing v3_nczarr_test because of CMake Policy [CMP002](https://cmake.org/cmake/help/latest/policy/CMP0002.html). 5. Ubuntu + Automake appears to work including testing v3_nczarr_test. 6. make distcheck does not work because of difficulties with VPATH.
Ok, this branch is usable except for MYS2 and Visual Studio. |
Thanks a lot for including the feature from #3066 @DennisHeimbigner and apologies for the delay in the reply!
There're many ways to improve this but here's one quick fix I did on my side that improves the time it takes to get the header information of the big dataset. (Edited for breaking tests)diff --git a/libnczarr/zinfer.c b/libnczarr/zinfer.c
index e11afd6f4..6181b7e32 100644
--- a/libnczarr/zinfer.c
+++ b/libnczarr/zinfer.c
@@ -51,11 +51,11 @@ struct ZarrObjects {
int zarr_version;
int haszmetadata;
} zarrobjects[] = {
-{"zarr.json", ZARRFORMAT3, 0},
-{".zgroup", ZARRFORMAT2, 0},
-{".zarray", ZARRFORMAT2, 0},
-{".zattrs", ZARRFORMAT2, 0},
-{".zmetadata", ZARRFORMAT2, 1},
+{"/zarr.json", ZARRFORMAT3, 0},
+{"/.zgroup", ZARRFORMAT2, 0},
+{"/.zarray", ZARRFORMAT2, 0},
+{"/.zattrs", ZARRFORMAT2, 0},
+{"/.zmetadata", ZARRFORMAT2, 1},
{NULL, 0, 0},
};
@@ -187,7 +187,6 @@ done:
return THROW(stat);
}
-int
NCZ_infer_open_zarr_format(NC_FILE_INFO_T* file)
{
int stat = NC_NOERR;
@@ -198,8 +197,19 @@ NCZ_infer_open_zarr_format(NC_FILE_INFO_T* file)
NC_UNUSED(file);
/* Probe the map for tell-tale objects and dict keys */
-
if(zarrformat == 0) {
+ size_t seglen = 0;
+ struct ZarrObjects *zo = NULL;
+ stat = NC_ENOTZARR; // Until proven otherwise we aren't sure it's a zarr dataset
+ /* We search on the root path for V2 or V3 tags */
+ for (zo = zarrobjects; zo->name; zo++) {
+ if ((stat = nczmap_exists(zfile->map,zo->name)) == NC_NOERR) {
+ zarrformat = zo->zarr_version;
+ break; /* No need to look for more keys */
+ }
+ }
+ }
+ if(zarrformat == 0 || stat != NC_NOERR) {
/* We need to search subtree for a V2 or V3 tag */
switch(stat = nczmap_walk(zfile->map,"/",tagsearch, ¶m)) {
case NC_ENOOBJECT:
@@ -308,7 +318,7 @@ tagsearch(NCZMAP* map, const char* prefix, const char* key, void* param)
if(seglen == 0) return NC_NOERR;
for(zo=zarrobjects;zo->name;zo++) {
- if(strcasecmp(segment,zo->name)==0) {
+ if(strcasecmp(segment,zo->name+1)==0) {
formats->zarrformat = zo->zarr_version;
return NC_NOERR; /* tell walker to stop */
} |
The first suggestion was breaking some tests so I added a commit to tackle this in a better way: mannreis@806c1a7 (s3consolidate-patch.txt) |
H/T Manual Reis Also turn off some debugging output.
Added a fix for issue (#3075) |
Now that you mention I was also facing out of space issues on my machine. Which went away after a bit of clean up and a reboot. Most data was on ~/.cache |
Can you run the run_ut_mapapi.sh script with the -x flag set so I can see where the url is being constructed? |
@DennisHeimbigner Here you go; I'll let you take a look before I dig into it, but I was going to try to sort this out after a lunch-hour meeting. I initially was poking around in |
@DennisHeimbigner does anything leap out at you from the log? |
Sorry, I got side-tracked. Let me look at it right now. |
Ok, I think I see the problem. Let me test it. |
I was forgetting to localize the file path. |
@DennisHeimbigner I believe it should be the second, |
@DennisHeimbigner here is a log from before the PR, with the file pathing that works. I am surprised to still see the |
@DennisHeimbigner The fact that the pathing used to work makes me wonder now if it was a double red-herring; I'm looking at the following from the first log I provided:
I see a mix of file path styles here; both |
I remember now. It is not clear where the "object not found" is coming from. |
In any case, I don't think any of this affects the "not enough space" problem. |
@DennisHeimbigner It makes sense to me that the 'not enough space' results from trying to access an invalid file path, or a path it does not have access to. Is there a way to check if the path exists during localization, and throw a message if it does not? |
@DennisHeimbigner I followed your suggestion and added the |
A further update, my previous statement was not correct, I've broken down the script a bit and it appears that the error is happening when |
The directory looks ok to me. Let me poke around. |
In order to get some handle on the role of the path converter, I am building a branch |
… in a filesystem which requires it. This may fail on cygwin, but I don't have an easy way to test that, so we shall see what the Github Actions reveal.
Despite the hiccups with the tests, I just want to ask @DennisHeimbigner to drop the |
I assume that means
Correct? |
Yes. I think a separate PR for that will help with closing this one. |
re: Unidata#3068 (comment) At Manuel Reis request, This commit disables the current Zarr-Over-HTTP (ZOH) support for URLS.
I see the out of space problem is still there. |
I had hoped to include this, it feels like we're almost there, but I'm going to get v4.9.3 out and then come back to this for the subsequent v4.10.0 release. |
I agree. This PR has been nothing but a disaster and I take full responsibility for it. |
Update: 1/6/2025: Ok, this branch is usable except for MYS2 and Visual Studio.
They are failing on some kind of "out of space" error whose
meaning I cannot figure out
=============================================
Update (12/29/24): I am going ahead and merging the code from PR #3066 since that code is pretty easy to follow and I have the best knowledge of what kind of tailoring is needed to adapt to the numerous changes.
=============================================
I am posting this PR as a draft so that Manuel Reis and others can work with it. I will revise this PR description later.
Notes:
make distcheck does not work because of difficulties with VPATH.It works with distcheck now.