-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OME Metadata Support #104
Comments
Quick update that with the script below as well as ome/ome-zarr-py#174, it's possible to build a Python reader of the ome_types parser
|
Possibly in scope for this work - see the discussion happening in #107 around the confusion created bv the usage of integers for group names. Except in the HCS case, the current |
Obviously, we're closest to 1b2c being usable but it's the least comfortable; like a jacket that's functional but just doesn't fit right. Given this uncomfortable position and certainly the community desire for more thorough use of Zarr group/array metadata and JSON markup I'd suggest that if we decide to formalize it maybe we call it OME-NGFF Transitional or something to that effect? What follows is some background for how HistoryFor the benefit of everyone let me first outline the history of how When work on To date, Around the same time, we were commissioned by several of our customers to develop tooling to convert Philips' iSyntax file format into OME-TIFF. Due to the nature of Philips' file format at the time the only option was to depend on Philips' SDK to support this conversion and the only programming language the SDK was available in was Python. Rewriting the entirety of high performance TIFF writing and OME-XML support present in Bio-Formats and All of this work predates the formalization of OME-NGFF and as stated in the aforementioned blog post "...converts to a temporary N5 or Zarr structure" and "The current N5 or Zarr intermediate format should be considered temporary at this stage of development as it is likely to undergo several changes over the coming months." The initial default intermediate format used was initially N5. Zarr support was poor at best and completely broken at worst.
I hope this helps everyone understand that as things stand today, the primary use case in the wild for Rationale for some design decisions
The content of
As aforementioned, These criteria are essential for being able to drive
Given the Footnotes |
I seem to remember that Vitessce made use of the |
Vitessce currently uses the metadata from the |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: |
@keller-mark, thanks! @ilan-gold / @manzt: any thoughts on the value (or perhaps the cost) of trying to focus on Vitessce as the client for this work as opposed to updating vizarr to make use of OME-XML? |
If you were to officially focus on OMEXML as the fastest route to full metadata support, we would probably just add this to the core Viv library and then add a loader to Vitessce for it, neither of which would take too long. I can't comment on Vizarr though. |
@ilan-gold : that's certainly the current goal of #112. As the only other reader implementation, if you have any lessons learned to add on top of http://api.csswg.org/bikeshed/?url=https://raw.githubusercontent.com/joshmoore/ngff/bf2raw/latest/index.bs#bf2raw, do let me know. |
@manzt and @joshmoore I am not sure if even Vizarr obeys the directive "SHOULD parse all images" - this seems taxing to do over HTTP and I was under the impression that it is something that was to be avoided because the metadata can often be inferred. Other than this, I don't any comments. Very exciting! |
Fair point. I think that's more a wording issue than an intent. I mean that clients should not ignore those images, by for example not disclosing their existence to a user. |
It should be straightforward to find and return this metadata for OME-NGFF once a format/location are decided on. As it stands, Viv is fairly unopinionated about what the metadata is; it's loaders more or less find OME-XML/.zattrs and the client (Vitessce/Vizarr) is responsible for choosing what to do with it. Given that Vitessce supports both OME-TIFF/OME-NGFF, I'm guessing it will be easier to display OME Metadata out of the box (compared to Vizarr) once support in Viv is added since it is already configured to display/use OME-XML for OME-TIFF. |
Pushed to #112, @ilan-gold, but reading it, I almost wonder if readers MUST detect the presence and SHOULD make it clear to users but only MAY show multiple images. Tricky.
Glad to hear it. Is there anything that should happen from our side to move this forward? One thing to note: there will be a next spec that will replace this one and be more explicit, e.g., the location of the XML might be configurable. But considering the amount of bf2raw 0.4.0 data that's out there, supporting this one may make sense. (If you think it's useful, we can also write down the earlier specs) |
This certainly sounds the most like the behavior you're going for.
Not to my knowledge, but perhaps I missed something here - do we not support something at the moment? Happy to remedy that, but I thought this was a proposal for future implementations |
I think an older version is supported, and by specifying the current one, I was hoping to give everyone in the community enough confidence to implement that while we worked on the replacement. |
@manzt is that what hms-dbmi/viv#403 is referring to? Do you want to lay out any pitfalls before I do this or respond to what Josh has said? |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/using-bioformats2raw-for-creating-ome-zarr-scale-format-string/72716/6 |
Dear all, Could you tell me please if it is currently possible to convert OME TIFF without pyramids to OME NGFF? Thank you for any input, |
Hi @aliaksei-chareshneu - yes, certainly. Either with https://www.glencoesoftware.com/products/ngff-converter/ or the underlying https://github.com/glencoesoftware/bioformats2raw command-line tool. Both use Bio-Formats to read files, so they support all the formats that Bio-Formats supports. |
@will-moore, thank you very much. Could you tell me please if it would result in some loss of metadata? |
@aliaksei-chareshneu There shouldn't be a loss of metadata. Both options will generate OME NGFF data in the |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/microscopy-metadata-in-zarr-files/87399/2 |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/qupath-0-6-0rc1-unable-to-read-remote-ome-zarr-by-url/101605/3 |
This issue has been mentioned on Image.sc Forum. There might be relevant details there: https://forum.image.sc/t/qupath-0-6-0rc1-unable-to-read-remote-ome-zarr-by-url/101605/4 |
This issue captures the requirements as well as the possible implementation choices for a first integration of OME metadata into the OME-NGFF container. The UoD team is targeting the specification as well as implementations, including within OMERO, by mid-2022.
Goal
NGFF specifications up to version 0.4 contain only minimal metadata fields that cover the existing OME model (e.g. physical pixel size). As a result, converting data into OME-NGFF, e.g. with
bioformats2raw
, loses more metadata than the equivalent conversion to OME-TIFF. The goal of this issue is to achieve a parity between the two formats in terms of capturing metadata contained in the OME model (2016-06).Here we would like to discuss, plan, and specify an initial integration of the OME model into OME-NGFF. As with other specifications, this initial work will likely be followed by multiple, possibly breaking, changes to expand the scope. Where possible, we will also try to capture that roadmap here.
In-scope requirements
The primary requirement is the ability to fully convert an OME-TIFF in its entirety into an OME-NGFF dataset.
As a corollary, it should be also possible to capture what Bio-Formats knows about proprietary files formats (PFFs). This should cover minimally the intermediate Zarr output of
bioformats2raw
and be readable by all readers (not justraw2ometiff
).The specification should make clear how the combination of the OME model metadata and NGFF metadata is to be interpreted and what readers should do in the case of a conflict.
Out-of-scope requirements
This need not be the final mechanism used for OME-NGFF to store metadata. It is more important to capture the metadata that exists today.
It is not necessary, at least initially, that an OME-NGFF be fully convertible back into an OME-TIFF since, e.g., there is no location for labels, transformations, or file annotations in OME-TIFF.
Design decision #1: Location of metadata
The current NGFF specification solely uses the formats custom-attributes (
.zattrs
) for storing metadata. Several other locations are conceivable, though some are more or less within the bounds of the Zarr specification (See zarr-developers/zarr-specs#112 for more discussion.)Option a)
.zattrs
The status quo at the moment is that all metadata should be represented as JSON in
.zattrs
. The benefit is that no new mechanism needs to be introduced. A downside is that metadata is spread across multiple zgroups and zarrays (See related comments in #102). Projects such as xarray store metadata in “well-known” keys within the.zattrs
like_ARRAY_DIMENSIONS
(docs).Option b) Custom files
An alternative is to introduce new files outside the scope of the Zarr spec, which only defines
.zattrs
,.zarray
,.zgroup
, and chunk files.bioformats2raw
currently stores metadata in a file namedMETADATA.ome.xml
. Other projects like netcdf-c store custom files (e.g..nczarr
; docs) with their own proprietary customizations. The benefit of this strategy is maximum flexibility since no key conflicts can occur. Implementations may need to be aware that such files are essentially 1-dimensional byte arrays.Option c) Arrays
Metadata files can be encoded as Zarr arrays, which is similar to option b) but does not require introducing any new Zarr behavior. Additionally, the files themselves can carry metadata in their own
.zattrs
and be chunked. However, all tools that wish to consume them must be Zarr-aware.Option d) String
Metadata can be encoded as a single (albeit large) string within
.zattrs
. Depending on Design Decision #2, storing a single string with the metadata has the advantage of working with existing formats as well as consolidating the metadata, but it does require escaping, etc.Design decision #2: Format of metadata
Similarly to #1, currently all metadata is stored as JSON within
.zattrs
.Option a) Design a JSON format
The option closest to the current NGFF process would consist of specifying a new JSON format to capture all of the information in OME-XML. This process would likely be extended and would need to be maintained for some time. One route to achieving it would be to generate json-schema from the XSD using ome-types.
Option b) Use the JSON-LD syntax of OME-OWL
Using JSON-LD would keep the metadata in JSON but would make use of the existing work on OME-OWL, and therefore not create another format that needs supporting. Additionally, the JSON-LD model provides an extensibility that is needed within the community. The downside is increased complexity in the programming model.
Option c) Store the OME-XML directly.
Finally, if the first goal is to support the existing model, using the OME-XML model is likely the fastest route. Downsides include the general aversion felt towards XML as well as the need to map between XML elements/identities and objections specified within the JSON. There will also not be an extensibility (beyond the standard annotations) in the first instance.
Implementation reports
Below we enumerate possible implementations and (eventually) the status of investigations into each of them. If anyone else is interested in proposing (or especially prototyping) an implementation, please mention so below.
1b2c: standardize the current bioformats2raw format
Standardizing the bioformats2raw output would require:
An additional benefit of this implementation is that the current
bioformats2raw
code can be adopted as the official .Related issues:
The text was updated successfully, but these errors were encountered: