Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR adding support for Zarr V3 #3068

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

DennisHeimbigner
Copy link
Collaborator

@DennisHeimbigner DennisHeimbigner commented Dec 23, 2024

Update: 1/6/2025: Ok, this branch is usable except for MYS2 and Visual Studio.
They are failing on some kind of "out of space" error whose
meaning I cannot figure out
=============================================
Update (12/29/24): I am going ahead and merging the code from PR #3066 since that code is pretty easy to follow and I have the best knowledge of what kind of tailoring is needed to adapt to the numerous changes.
=============================================

I am posting this PR as a draft so that Manuel Reis and others can work with it. I will revise this PR description later.

Notes:

  1. It does not yet run on github actions without errors.
  2. It appears to have no memory leaks
  3. It has problems with the testing infrastructure that I am working on.
  4. CMake does not work for testing v3_nczarr_test because of CMake Policy CMP002.
  5. Ubuntu + Automake appears to work including testing v3_nczarr_test.
  6. make distcheck does not work because of difficulties with VPATH.It works with distcheck now.

can work with it. I will revise this PR description later.

### Notes:
1. It does not yet run on github actions without errors.
2. It appears to have no memory leaks
3. It has problems with the testing infrastructure that I am working on.
4. CMake does not work for testing v3_nczarr_test because of CMake Policy
   [CMP002](https://cmake.org/cmake/help/latest/policy/CMP0002.html).
5. Ubuntu + Automake appears to work including testing v3_nczarr_test.
6. make distcheck does not work because of difficulties with VPATH.
@DennisHeimbigner
Copy link
Collaborator Author

DennisHeimbigner commented Jan 7, 2025

Ok, this branch is usable except for MYS2 and Visual Studio.
They are failing on some kind of "out of space" error whose
meaning I cannot figure out.

@mannreis
Copy link
Contributor

mannreis commented Jan 9, 2025

Thanks a lot for including the feature from #3066 @DennisHeimbigner and apologies for the delay in the reply!
I just build this branch, rebased on Unidata/main and ran on ncdump -h a huge dataset:

  • I see that although the metadata handler is correctly initialized (dataset is consolidated) the format inference happens before that and it walks the storage which takes a lot of time for big datasets on our system. Since the dataset is consolidated (as verified after the walk), it is better to simply check for the existence of any of the zarr metadata files across all versions.
There're many ways to improve this but here's one quick fix I did on my side that improves the time it takes to get the header information of the big dataset. (Edited for breaking tests)
diff --git a/libnczarr/zinfer.c b/libnczarr/zinfer.c
index e11afd6f4..6181b7e32 100644
--- a/libnczarr/zinfer.c
+++ b/libnczarr/zinfer.c
@@ -51,11 +51,11 @@ struct ZarrObjects {
   int zarr_version;
   int haszmetadata;
} zarrobjects[] = {
-{"zarr.json",  ZARRFORMAT3,    0},
-{".zgroup",    ZARRFORMAT2,    0},
-{".zarray",    ZARRFORMAT2,    0},
-{".zattrs",    ZARRFORMAT2,    0},
-{".zmetadata", ZARRFORMAT2,    1},
+{"/zarr.json", ZARRFORMAT3,    0},
+{"/.zgroup",   ZARRFORMAT2,    0},
+{"/.zarray",   ZARRFORMAT2,    0},
+{"/.zattrs",   ZARRFORMAT2,    0},
+{"/.zmetadata",        ZARRFORMAT2,    1},
{NULL,         0,              0},
};

@@ -187,7 +187,6 @@ done:
   return THROW(stat);
}

-int
NCZ_infer_open_zarr_format(NC_FILE_INFO_T* file)
{
   int stat = NC_NOERR;
@@ -198,8 +197,19 @@ NCZ_infer_open_zarr_format(NC_FILE_INFO_T* file)
   NC_UNUSED(file);

   /* Probe the map for tell-tale objects and dict keys */
-
   if(zarrformat == 0) {
+       size_t seglen = 0;
+       struct ZarrObjects *zo = NULL;
+       stat = NC_ENOTZARR; // Until proven otherwise we aren't sure it's a zarr dataset
+       /* We search on the root path for V2 or V3 tags */
+       for (zo = zarrobjects; zo->name; zo++) {
+               if ((stat = nczmap_exists(zfile->map,zo->name)) == NC_NOERR) {
+                       zarrformat = zo->zarr_version;
+                       break; /* No need to look for more keys */
+               }
+       }
+    }
+    if(zarrformat == 0 || stat != NC_NOERR) {
       /* We need to search subtree for a V2 or V3 tag */
       switch(stat = nczmap_walk(zfile->map,"/",tagsearch, &param)) {
       case NC_ENOOBJECT:
@@ -308,7 +318,7 @@ tagsearch(NCZMAP* map, const char* prefix, const char* key, void* param)
   if(seglen == 0) return NC_NOERR;
   
   for(zo=zarrobjects;zo->name;zo++) {
-       if(strcasecmp(segment,zo->name)==0) {
+       if(strcasecmp(segment,zo->name+1)==0) {
           formats->zarrformat = zo->zarr_version;
          return NC_NOERR; /* tell walker to stop */
      }

@mannreis
Copy link
Contributor

mannreis commented Jan 15, 2025

The first suggestion was breaking some tests so I added a commit to tackle this in a better way: mannreis@806c1a7 (s3consolidate-patch.txt)
This will check for presence of individual metadatafiles and fall back to listing the contents of the storage in case they aren't present on the top level.

@DennisHeimbigner DennisHeimbigner marked this pull request as ready for review January 15, 2025 21:31
@DennisHeimbigner DennisHeimbigner changed the title Draft PR adding support for Zarr V3 PR adding support for Zarr V3 Jan 15, 2025
@DennisHeimbigner
Copy link
Collaborator Author

Added a fix for issue (#3075)

@mannreis
Copy link
Contributor

Now that you mention I was also facing out of space issues on my machine. Which went away after a bit of clean up and a reboot. Most data was on ~/.cache

@DennisHeimbigner
Copy link
Collaborator Author

Can you run the run_ut_mapapi.sh script with the -x flag set so I can see where the url is being constructed?

@WardF
Copy link
Member

WardF commented Jan 22, 2025

Can you run the run_ut_mapapi.sh script with the -x flag set so I can see where the url is being constructed?

@DennisHeimbigner Here you go; I'll let you take a look before I dig into it, but I was going to try to sort this out after a lunch-hour meeting. I initially was poking around in d4pathmgr. c, but since this is only happening in nczarr_test/, that's probably a red herring. This is happening in an MSYS Bash shell, on Github-hosted runners, and on my development VM.

@WardF
Copy link
Member

WardF commented Jan 23, 2025

@DennisHeimbigner does anything leap out at you from the log?

@DennisHeimbigner
Copy link
Collaborator Author

Sorry, I got side-tracked. Let me look at it right now.

@DennisHeimbigner
Copy link
Collaborator Author

Ok, I think I see the problem. Let me test it.

@DennisHeimbigner
Copy link
Collaborator Author

DennisHeimbigner commented Jan 23, 2025

I was forgetting to localize the file path.
On msys, this now produces this file path:
D:/a/netcdf-c/netcdf-c/alltests_1737666259/nczarr_test/testdir_mapapi/tmp_mapapi.file
Does this seem correct?
But there appear to be a problem with the corresponding URK, namely:
file://D:/a/netcdf-c/netcdf-c/alltests_1737666259/nczarr_test/testdir_mapapi/tmp_mapapi.file#mode=nczarr,file
Does this URL work on MSYS2?
Or should it be something like:
file:///d/a/netcdf-c/netcdf-c/alltests_1737666259/nczarr_test/testdir_mapapi/tmp_mapapi.file#mode=nczarr,file

@WardF
Copy link
Member

WardF commented Jan 23, 2025

@DennisHeimbigner I believe it should be the second, file:///d (...) but I'm double checking. For what it's worth, it should be whatever the other non-nczarr tests are doing, as those shell scripts are working.

@WardF
Copy link
Member

WardF commented Jan 23, 2025

@DennisHeimbigner here is a log from before the PR, with the file pathing that works. I am surprised to still see the file:///cygdrive nomenclature, but the pathing does work. It appears that file:/// is the appropriate prefix.

@WardF
Copy link
Member

WardF commented Jan 23, 2025

@DennisHeimbigner The fact that the pathing used to work makes me wonder now if it was a double red-herring; I'm looking at the following from the first log I provided:

Pass: create: create: file:///cygdrive/C/Users/wardf/Desktop/tmp/netcdf-c-DennisHeimbigner-v3plug.dmh/build/alltests_1737569215/nczarr_test/testdir_mapapi/tmp_mapapi.file#mode=nczarr,file
Pass: create: defineobj: /.zgroup
Pass: create: close
Pass: create: open: file:///cygdrive/C/Users/wardf/Desktop/tmp/netcdf-c-DennisHeimbigner-v3plug.dmh/build/alltests_1737569215/nczarr_test/testdir_mapapi/tmp_mapapi.file#mode=nczarr,file
Pass: create: exists: /.zgroup
Pass: create: close
+ cdl=ut_mapapi_create_file.cdl
+ ref=ref_ut_mapapi_create.cdl
+ C:/Users/wardf/Desktop/tmp/netcdf-c-DennisHeimbigner-v3plug.dmh/build/nczarr_test/../nczarr_test/zmapio file://tmp_mapapi.file#mode=nczarr,file
Default type: ubyte
Default action: objdump
fail: NetCDF: Some object not found

I see a mix of file path styles here; both file:/// and C:/Users. Is this potentially a problem resulting in NetCDF: Some object not found?

@DennisHeimbigner
Copy link
Collaborator Author

I remember now.
Using the cygwin form as intermediary is ok, because when the file is opened by libnetcdf,
it first does the localization of the path so you end up with the correct path being passed to open/fopen.

It is not clear where the "object not found" is coming from.
If the file is missing you can test by putting the following command
find ${ISOPATH}
in run_ut_mapapi in the testmapcreate() function.
There is a ````${ZMD}``` command (alias for calling zmapio) in that function.
If you put the above find command just before that ZMD call, it should list
everything in that directory. Then we can see if there is a missing file.

@DennisHeimbigner
Copy link
Collaborator Author

In any case, I don't think any of this affects the "not enough space" problem.

@WardF
Copy link
Member

WardF commented Jan 24, 2025

@DennisHeimbigner It makes sense to me that the 'not enough space' results from trying to access an invalid file path, or a path it does not have access to. Is there a way to check if the path exists during localization, and throw a message if it does not?

@WardF
Copy link
Member

WardF commented Jan 24, 2025

@DennisHeimbigner I followed your suggestion and added the find ${ISOPATH} command. It is completely empty, which will explain why the objects are failing to be found.

@WardF
Copy link
Member

WardF commented Jan 24, 2025

A further update, my previous statement was not correct, I've broken down the script a bit and it appears that the error is happening when zmapio is given the directory; despite the contents being there, it reports a missing object. I've attached the directory below, @DennisHeimbigner can you tell if it is correct/intact? I'm trying to determine if the issue is zmapio reading, or ut_mapapi writing.

@DennisHeimbigner
Copy link
Collaborator Author

The directory looks ok to me. Let me poke around.

@DennisHeimbigner
Copy link
Collaborator Author

In order to get some handle on the role of the path converter, I am building a branch
off of master that disables all path conversions.

… in a filesystem which requires it. This may fail on cygwin, but I don't have an easy way to test that, so we shall see what the Github Actions reveal.
@mannreis
Copy link
Contributor

Despite the hiccups with the tests, I just want to ask @DennisHeimbigner to drop the ZOH changes from this PR (which is not even compiling).

@DennisHeimbigner
Copy link
Collaborator Author

I assume that means

  1. remove the zoh: protocol
  2. remove the zoh mode flag

Correct?

@mannreis
Copy link
Contributor

Yes. I think a separate PR for that will help with closing this one.

@DennisHeimbigner
Copy link
Collaborator Author

I see the out of space problem is still there.

@WardF WardF self-assigned this Jan 30, 2025
@WardF
Copy link
Member

WardF commented Jan 30, 2025

I had hoped to include this, it feels like we're almost there, but I'm going to get v4.9.3 out and then come back to this for the subsequent v4.10.0 release.

@WardF WardF added this to the v4.10.0 milestone Jan 30, 2025
@DennisHeimbigner
Copy link
Collaborator Author

I agree. This PR has been nothing but a disaster and I take full responsibility for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants