-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Imaris Reader: support for LZ4 compression and performance improvements #4249
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @marcobitplane, this definitely looks like a reasonable approach.
Two higher-level questions before we proceed with more thorough review and testing:
lz4-java
is added as a dependency, but the existing dependency on aircompressor via ome-codecs should already support LZ4. Is there a reason to uselz4-java
specifically, or couldaircompressor
be used instead?- The new
LZ4Filter
uses theucar.nc2.filter
package; if at all possible, I'd prefer to see this in the existing Bio-Formats package structure (loci.formats.*
) rather thanucar.nc2.filter
. A brief read of https://docs.unidata.ucar.edu/netcdf-java/5.5/userguide/reading_zarr.html#implementing-a-filter suggests the package name of the custom filter itself doesn't matter, as long as it extends/implements the correct class/interface. Could you please either change the package name, or explain a bit why puttingLZ4Filter
inucar.nc2.filter
is necessary?
Finally, in order to merge this we would need a signed Contributor License Agreement. That's not urgent at this point, but it would be good to have sooner rather than later.
Thank you @melissalinkert! I modified the PR to use aircompressor instead of lz4-java and moved LZ4Filter to loci/formats/filter, let me know if that is not ideal or there's more to change there. I'll send the CLA today. |
Adding to tonight's build, so we should know in the morning if there are any test failures on existing data. @marcobitplane : maybe I missed this, but I don't see here or in #4217 a reference to lz4 test files. Do you have any Imaris HDF files with lz4 compression that we can use to test this? |
Thank you @melissalinkert, here are two lz4-compressed cropped versions of Imaris demo images (retina and CellDemoMembrane3D). Feel free to let me know if you would prefer to have more images or have them shared in a different way. |
@marcobitplane thanks for sharing some sample data. Our preferred way to collect such samples is to upload them to the Bio-Formats Zenodo community collection. For the OME team and the community, has the benefit of unambiguously assigning a license for the distribution and re-usage of these samples. In addition, it is possible to add additional samples to the upload and create new versions of the dataset as necessary during the review process. I had a quick go at testing these with the proposed changed and the Bio-Formats command-line utility. With the just released 8.0.1 command-line tools, Bio-Formats fails with
With a local build of Bio-Formats using the HEAD of this PR, I have
adding the
The other sample |
Hello,
this pull request adds support for LZ4 compressed ims files and modifies ImarisHDFReader to avoid multiple reads of the same 3D chunks. See also #4217.
LZ4 support is added using NetCDF-Java's ucar.nc2.filter package, which provides a mechanism for user-supplied filters as described here.
Regarding performance, ImarisHDFReader is modified to have a caching mechanism that reads a stack of planes (as many planes as the chunk z-size) from all channels into a buffer, which only needs to be updated after all data in it has been read. If the size of the buffer would exceed 1GB, the reader falls back to reading the requested plane only. The exact performance improvement will depend on the details of the dataset: in our testing, for 3D datasets the new reader can be over an order of magnitude faster than the existing implementation.