OME-XML equivalent data #27

LeeKamentsky · 2021-02-19T13:42:27Z

Hi all, first, thanks for starting this project - we are considering NGFF / Zarr 3.0 for large 3D multichannel datasets of light-sheet microscopy brain data.

I'm wondering if there is any plan to capture OME-XML data in the Zarr attribute hierarchy, in particular the microscopy data such as Instrument. I'd be happy to participate in the discussion and formulation of this extension.

--Lee

joshmoore · 2021-02-19T13:50:46Z

Hey Lee!

Good to hear from you. Short answer is capturing all the content of the OME-XML model in the Zarr (though likely in JSON rather than XML) is definitely on the roadmap. And it'd be wonderful to have your input & help. Don't know if you've seen it yet under the #ome-ngff tag but there'll be timezone-paired calls next Tuesday if you're interested in getting caught up:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-feb-23/48386/4

All the best,
~Josh

joshmoore · 2021-03-29T11:25:05Z

So this issue has come up again recently with the interest from the aicsimageio team (cc: @jacksonmaxfield) of starting to consume the METADATA.ome.xml file from bioformats2raw sooner rather than later. ergo it would need to be part of an upcoming ome-ngff spec. @manzt pointed out that such files within a Zarr fileset are currently outside of the data model: the only files which Zarr knows about are .zgroup, .zarray, .zattrs, and chunks.

The discussion talked through a number of options:

1.) Dump METADATA.xml file in root of zarr store. Pros: simple / currently implemented, Cons: Not a part of zarr data-model

.
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    ├── ...
    └── METADATA.xml

2.) Add METADATA.xml to root .zattrs. Pros: simple / fits zarr's data-model, Cons: Increases the size of .zattrs, might not be desirable if not a common access pattern.

.
└── data.zarr/
    ├── .zattrs # <- METADATA.xml appended here
    ├── .zgroup
    └── ...

3.) Add customOME group to zarr root with XML in attrs. Pros: fit's zarr's data model, doesn't bloat root attrs. Cons: slightly more complicated, writes two files (OME/.zattrs, OME/.zgroup)

.
└── data.zarr/
    ├── OME/
    |  ├── .zgroup
    |  └── .zattrs # <- METADATA.xml appended here
    ├── .zattrs  # Don't touch current .zattrs
    ├── .zgroup
    └── ...

4a.) Add array to zarr root with XML. Pros: scales to handle large files, attributes can be added to the array Cons: encoding issues, writes two files (OME/.zattrs, OME/.zgroup)

├── .zattrs
├── .zgroup
└── OME
    ├── .zarray
    └── .zattrs <-- contains file-level metadata

4b.) Add array to a root zarr group. Pros: same plus other arrays can be in the same location, Cons: even one more file

├── .zattrs
├── .zgroup
└── OME
    ├── .zgroup <-- contains pointers to files
    └── XML
        ├── .zarray
        └── .zattrs <-- contains file-level metadata

(outside this repository) Add to the zarr specification a concept of Files (other than the .z* files and chunks) defined to be "a 1-dimensional array without chunking" (see v2: interpretation of files outside of the spec zarr-developers/zarr-specs#112)

(Thanks to @jacksonmaxfield and @manzt for driving the definition of the above.)

Update: to be clear, likely whatever mechanism is chosen here will be used for other File objects: opaque analysis results, FileAnnotations from OMERO, etc.

evamaxfield · 2021-03-31T18:50:26Z

Wanted to chime in with my opinion copied over from zulip:

Ranked choice of proposed options (most preferred to least preffered):

Option 2
Option 4a
Option 3
Option 4b
Option 1

Haven't placed Option 5 as I assume we will likely discuss it in next zarr devs meeting.

manzt · 2021-03-31T19:55:54Z

I have the same rankings as Jackson.

With regard to Option 5, I kind of think the store itself captures the idea of a File object in Zarr, it's just a Zarr client is limited as to what keys it will read and write. In that sense, it should be "ok" to add any arbitrary File objects to the store as long as the names (keys) don't conflict with something Zarr will read or write. The question is question Zarr should recognize non-array/group keys or have a formal way of allowing arbitrary non-chunk/metadata objects.

I'll try to jump on the next zarr dev call :)

LeeKamentsky · 2021-04-01T12:29:19Z

For the BIDS spec one thing that's being discussed is a hierarchy with inheritance. The root might have things that are in common, like the microscopy setup part of the OME-XML (or OME-JSON to be) and sample information. As you went up the tree, to volumes and such, those would get the details of the particular acquisition such as stage position, staining conditions. For BIDS, what we've discussed is it being up to the researcher at what level to put a particular piece of information and an inheritance rule for aggregating everything.

Something like:

|-- .zattrs <- contains attributes common to everything below, e.g. the "Instrument" element
|-- .zgroup
|-- 20210316_70C-1
    |-- .zgroup
    |-- .zattrs
    |-- 20210316_70C-1-R1-YO-CR-GF.zarr
        |-- .zarray
        |-- .zattrs <- Channel info: YO antibody channel 0, CR antibody channel 1, GF antibody channel 2
    |-- 20210322_70C-1-R2-YO-CB-NPY.zarr
    ...

My votes for the above would be 4a, then 2. Offhand I'd expect the computational and space considerations of duplicating and parsing the metadata to be much less than computing on the data itself. It would be computationally inexpensive (but with possible synchronization issues) to download the JSON into a database and compute on it there. 4a has the flaw, though, of the same key in two places, inadvertently with different values.

joshmoore · 2021-04-27T10:59:43Z

Copying an update from zarr-developers/zarr-specs#112 that the most likely interpretation of additional files will would enable under Option 1.

joshmoore · 2021-05-13T11:02:33Z

Note: a related conversation is ongoing under "NCZarr - Netcdf Support for Zarr" (zarr-developers/zarr-specs#41), especially relevant regarding situations where we might want to join together two or more specs (here, OME & BIDS; there, xarray & NetCDF)

imagesc-bot · 2021-08-06T10:57:17Z

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/issue-with-opening-zarr-inside-napari/56089/12

joshmoore · 2022-04-07T06:46:37Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OME-XML equivalent data #27

OME-XML equivalent data #27

LeeKamentsky commented Feb 19, 2021

joshmoore commented Feb 19, 2021

joshmoore commented Mar 29, 2021 •

edited

Loading

evamaxfield commented Mar 31, 2021

manzt commented Mar 31, 2021 •

edited

Loading

LeeKamentsky commented Apr 1, 2021

joshmoore commented Apr 27, 2021

joshmoore commented May 13, 2021 •

edited

Loading

imagesc-bot commented Aug 6, 2021

joshmoore commented Apr 7, 2022

OME-XML equivalent data #27

OME-XML equivalent data #27

Comments

LeeKamentsky commented Feb 19, 2021

joshmoore commented Feb 19, 2021

joshmoore commented Mar 29, 2021 • edited Loading

evamaxfield commented Mar 31, 2021

manzt commented Mar 31, 2021 • edited Loading

LeeKamentsky commented Apr 1, 2021

joshmoore commented Apr 27, 2021

joshmoore commented May 13, 2021 • edited Loading

imagesc-bot commented Aug 6, 2021

joshmoore commented Apr 7, 2022

joshmoore commented Mar 29, 2021 •

edited

Loading

manzt commented Mar 31, 2021 •

edited

Loading

joshmoore commented May 13, 2021 •

edited

Loading