Thoughts on regionally varying keep bits #160
Replies: 2 comments 3 replies
-
@observingClouds I'm super excited about calculating keepbits on a per-chunk basis! This would decrease the storage further as well as be more faithful to the information content because as you mention, the information content certainly varies with time and space! Snow-depth in the US is a good example! We might not want to try this until the Dask workers can calculate the keepbits for each chunk in parallel using the keepbits algorithm in Python, right? |
Beta Was this translation helpful? Give feedback.
-
Hey! I read that you are offering to mentor this project for GSoC and I am very interested in applying! I just had a few questions while thinking about a proposal:
I recognize that I am a complete beginner. I open to any and all of your feedback and suggestions :) |
Beta Was this translation helpful? Give feedback.
-
In todays Pangeo talk presenting xbitinfo the idea came up to allow for regionally (spatial, temporal) varying keep bits. One example would be the height dimension. Depending on the variable, a surface layer might have more actual variability than is present ( or "real" ) at high altitudes. As a consequence the surface layer needs more keep-bits to preserve the mutual information than a stratospheric layer.
Theoretically this is already possible:
_QuantizeBitRoundNumberOfSignificantBits
would not be a global attribute and should rather be defined per chunk ( or not at all, as it can be retrieved by analysing the data )How can this be made more user friendly?
In particular:
zarr
support different compressors per chunk?Ultimately, one might want to experiment with retrieving the mutual information at every grid-point at every time and apply bit rounding accordingly. Ideally, the keep bits would dynamically adjust. If the grid-points are for example within an ocean-eddy the number of keep-bits increases compared to a rather laminar flow in other parts of the ocean. In this case the retrieval of the bitinformation needs to be directly linked with the bit-rounding.
I'm curious to see other thoughts.
Beta Was this translation helpful? Give feedback.
All reactions