-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broken zarr? #14
Comments
same for https://kitware.github.io/itk-vtk-viewer/app/?gradientOpacity=0.3&image=https://dandiarchive.s3.amazonaws.com/zarr/924410ea-f204-4559-930c-d77677b82a92 for sub-MITU01/ses-20220321h14m52s06/micr/sub-MITU01_ses-20220321h14m52s06_sample-23_stain-NN_run-1_chunk-1_SPIM.ome.zarr |
could also look differently, e.g. on sub-MITU01h3/ses-20211219h13m23s45/micr/sub-MITU01h3_ses-20211219h13m23s45_sample-16_stain-YO_run-1_chunk-1_SPIM.ome.zarr https://kitware.github.io/itk-vtk-viewer/app/?gradientOpacity=0.3&image=https://dandiarchive.s3.amazonaws.com/zarr/022d7840-f696-4455-b67a-759dd6ae6341 |
@yarikoptic - the only way to check in dandi is a match between cli and server side checks. we should upload the zarr checksum with the zarr and for the server side checksum should check if they match, and mark it invalid if it doesn't. zarr allows not having to store every cell of an array. for example, it will simply treat as 0 any chunk that's missing. the browser 404s are not always an issue as far as i know. let's ping @slaytonmarx. at least he should be able to verify checksum locally. is there a list you can provide of paths and zarr checksums that you thought were broken? |
Zarrs might have been broken to start with (incomplete/interrupted generation etc) where checksum checking wouldn't help. I do not have a complete list but later will post a list of some worth checking. |
indeed, but that's a local check to verify the relationship between whatever data format the conversions were made from. and i leave that in the hands of the data producers to check. once given to our toolkits, checksum should verify that. in addition for dandiset 108, we did put in a secondary check, although a basic one, that checks for errors across the pyramid. this is viewable in the dashboard. we will be doing more work in an upcoming bican project, but indeed this dandiset should be checked. |
here is the dashboard: https://biccn.github.io/Quarterly_Submission_Receipts/000108-dashboard.html |
could you point to the code? is it fast/sensible to do on remote (already on s3) zarrs? I see no errors reported in the dashboard -- all green in "errors", so I guess it is not effective either at detecting errors at hand? |
the code is in the example notebooks of the 000108. it again depends on what you mean by errors. missing chunks is not necessarily an error. and yes, the analysis can be run remotely. and it's not all green! and yes, this is a simple check. |
I just got lost in my navigation of it and thought that it was the last one which only depicted the errors and it is green . Indeed some intermediate ones have errors e.g.
on my end I saw lots of as e.g. in this exampleTraceback (most recent call last):
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/mapping.py", line 151, in __getitem__
result = self.fs.cat(k)
^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/cached.py", line 436, in <lambda>
return lambda *args, **kw: getattr(type(self), item).__get__(self)(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/cached.py", line 638, in cat
self.fs.get(getpaths, storepaths)
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 650, in _get
return await _run_coros_in_chunks(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/asyn.py", line 254, in _run_coros_in_chunks
await asyncio.gather(*chunk, return_exceptions=return_exceptions),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/asyncio/tasks.py", line 452, in wait_for
return await fut
^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/callbacks.py", line 81, in func
return await fn(path1, path2, callback=child, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/http.py", line 250, in _get_file
self._raise_not_found_for_status(r, rpath)
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/implementations/http.py", line 215, in _raise_not_found_for_status
raise FileNotFoundError(url)
FileNotFoundError: https://dandiarchive.s3.amazonaws.com/zarr/00291b95-6495-499c-9181-8224ce82ab3f/.zmetadata
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/storage.py", line 1425, in __getitem__
return self.map[key]
~~~~~~~~^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/fsspec/mapping.py", line 155, in __getitem__
raise KeyError(key)
KeyError: '.zmetadata'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/yoh/proj/dandi/zarr-manifests/./validate_zarr.py", line 98, in <module>
dataset = zarr.open_consolidated(store=store)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/convenience.py", line 1342, in open_consolidated
meta_store = ConsolidatedStoreClass(store, metadata_key=metadata_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/storage.py", line 2944, in __init__
meta = json_loads(self.store[metadata_key])
~~~~~~~~~~^^^^^^^^^^^^^^
File "/home/yoh/miniconda3/envs/zarr-manifests/lib/python3.11/site-packages/zarr/storage.py", line 1427, in __getitem__
raise KeyError(key) from e
KeyError: '.zmetadata'
url: https://dandiarchive.s3.amazonaws.com/zarr/00291b95-6495-499c-9181-8224ce82ab3f #errors: failed-to-open time: -100.00 seconds num_files: N/A total_size est: N/A time: -100.00 and of those kind we have 784 hits out of 4554. (note that the zarrs here are as present on S3, so not necessarily even associated with any asset or specifically with this dandiset) i think the best would be to look into those few I originally reported about
@slaytonmarx also already confirmed that there is validation issues picked up by validating against ngff schema: and that is the errors we (@waxlamp also mentioned them IIRC) observe using ome-ngff validator online which we integrated with the archive, e.g. on which suggests that 000108 zarrs are not "clean" .ome.zarr and need fixing. Hopefully it could be a minor fix so we do not full zarr reuploads and see if we can now benefit from the "multifile" nature of zarrs. Meanwhile I think we should push on adding more validation to |
https://kitware.github.io/itk-vtk-viewer/app/?gradientOpacity=0.3&image=https://dandiarchive.s3.amazonaws.com/zarr/0d5b9be5-e626-4f6a-96da-b6b602954899 didn't show anything while browser console logged:
that is for the sub-U01hm15x_ses-20220731h17m24s47_sample-mEhmAD031x15R2_ABETA_stain-ABETA_run-1_chunk-1_SPIM.ome.zarr
from
https://dandiarchive.org/dandiset/000108/draft/files?location=sub-U01hm15x%2Fses-20220731h17m24s47%2Fmicr%2F
I chose it "randomly" but looking at "smallest 'real' zarrs" in the archive:
The text was updated successfully, but these errors were encountered: