Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OME-XML equivalent data #27

Open
LeeKamentsky opened this issue Feb 19, 2021 · 9 comments
Open

OME-XML equivalent data #27

LeeKamentsky opened this issue Feb 19, 2021 · 9 comments

Comments

@LeeKamentsky
Copy link

Hi all, first, thanks for starting this project - we are considering NGFF / Zarr 3.0 for large 3D multichannel datasets of light-sheet microscopy brain data.

I'm wondering if there is any plan to capture OME-XML data in the Zarr attribute hierarchy, in particular the microscopy data such as Instrument. I'd be happy to participate in the discussion and formulation of this extension.

--Lee

@joshmoore
Copy link
Member

Hey Lee!

Good to hear from you. Short answer is capturing all the content of the OME-XML model in the Zarr (though likely in JSON rather than XML) is definitely on the roadmap. And it'd be wonderful to have your input & help. Don't know if you've seen it yet under the #ome-ngff tag but there'll be timezone-paired calls next Tuesday if you're interested in getting caught up:

https://forum.image.sc/t/next-call-on-next-gen-bioimaging-data-tools-feb-23/48386/4

All the best,
~Josh

@joshmoore
Copy link
Member

joshmoore commented Mar 29, 2021

So this issue has come up again recently with the interest from the aicsimageio team (cc: @jacksonmaxfield) of starting to consume the METADATA.ome.xml file from bioformats2raw sooner rather than later. ergo it would need to be part of an upcoming ome-ngff spec. @manzt pointed out that such files within a Zarr fileset are currently outside of the data model: the only files which Zarr knows about are .zgroup, .zarray, .zattrs, and chunks.

The discussion talked through a number of options:

1.) Dump METADATA.xml file in root of zarr store. Pros: simple / currently implemented, Cons: Not a part of zarr data-model

.
└── data.zarr/
    ├── .zattrs
    ├── .zgroup
    ├── ...
    └── METADATA.xml

2.) Add METADATA.xml to root .zattrs. Pros: simple / fits zarr's data-model, Cons: Increases the size of .zattrs, might not be desirable if not a common access pattern.

.
└── data.zarr/
    ├── .zattrs # <- METADATA.xml appended here
    ├── .zgroup
    └── ...

3.) Add customOME group to zarr root with XML in attrs. Pros: fit's zarr's data model, doesn't bloat root attrs. Cons: slightly more complicated, writes two files (OME/.zattrs, OME/.zgroup)

.
└── data.zarr/
    ├── OME/
    |  ├── .zgroup
    |  └── .zattrs # <- METADATA.xml appended here
    ├── .zattrs  # Don't touch current .zattrs
    ├── .zgroup
    └── ...

4a.) Add array to zarr root with XML. Pros: scales to handle large files, attributes can be added to the array Cons: encoding issues, writes two files (OME/.zattrs, OME/.zgroup)

├── .zattrs
├── .zgroup
└── OME
    ├── .zarray
    └── .zattrs <-- contains file-level metadata

4b.) Add array to a root zarr group. Pros: same plus other arrays can be in the same location, Cons: even one more file

├── .zattrs
├── .zgroup
└── OME
    ├── .zgroup <-- contains pointers to files
    └── XML
        ├── .zarray
        └── .zattrs <-- contains file-level metadata
  1. (outside this repository) Add to the zarr specification a concept of Files (other than the .z* files and chunks) defined to be "a 1-dimensional array without chunking" (see v2: interpretation of files outside of the spec zarr-developers/zarr-specs#112)

(Thanks to @jacksonmaxfield and @manzt for driving the definition of the above.)

Update: to be clear, likely whatever mechanism is chosen here will be used for other File objects: opaque analysis results, FileAnnotations from OMERO, etc.

@evamaxfield
Copy link

Wanted to chime in with my opinion copied over from zulip:

Ranked choice of proposed options (most preferred to least preffered):

  1. Option 2
  2. Option 4a
  3. Option 3
  4. Option 4b
  5. Option 1

Haven't placed Option 5 as I assume we will likely discuss it in next zarr devs meeting.

@manzt
Copy link

manzt commented Mar 31, 2021

I have the same rankings as Jackson.

With regard to Option 5, I kind of think the store itself captures the idea of a File object in Zarr, it's just a Zarr client is limited as to what keys it will read and write. In that sense, it should be "ok" to add any arbitrary File objects to the store as long as the names (keys) don't conflict with something Zarr will read or write. The question is question Zarr should recognize non-array/group keys or have a formal way of allowing arbitrary non-chunk/metadata objects.

I'll try to jump on the next zarr dev call :)

@LeeKamentsky
Copy link
Author

For the BIDS spec one thing that's being discussed is a hierarchy with inheritance. The root might have things that are in common, like the microscopy setup part of the OME-XML (or OME-JSON to be) and sample information. As you went up the tree, to volumes and such, those would get the details of the particular acquisition such as stage position, staining conditions. For BIDS, what we've discussed is it being up to the researcher at what level to put a particular piece of information and an inheritance rule for aggregating everything.

Something like:

|-- .zattrs <- contains attributes common to everything below, e.g. the "Instrument" element
|-- .zgroup
|-- 20210316_70C-1
    |-- .zgroup
    |-- .zattrs
    |-- 20210316_70C-1-R1-YO-CR-GF.zarr
        |-- .zarray
        |-- .zattrs <- Channel info: YO antibody channel 0, CR antibody channel 1, GF antibody channel 2
    |-- 20210322_70C-1-R2-YO-CB-NPY.zarr
    ...

My votes for the above would be 4a, then 2. Offhand I'd expect the computational and space considerations of duplicating and parsing the metadata to be much less than computing on the data itself. It would be computationally inexpensive (but with possible synchronization issues) to download the JSON into a database and compute on it there. 4a has the flaw, though, of the same key in two places, inadvertently with different values.

@joshmoore
Copy link
Member

Copying an update from zarr-developers/zarr-specs#112 that the most likely interpretation of additional files will would enable under Option 1.

@joshmoore
Copy link
Member

joshmoore commented May 13, 2021

Note: a related conversation is ongoing under "NCZarr - Netcdf Support for Zarr" (zarr-developers/zarr-specs#41), especially relevant regarding situations where we might want to join together two or more specs (here, OME & BIDS; there, xarray & NetCDF)

@imagesc-bot
Copy link

This issue has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/issue-with-opening-zarr-inside-napari/56089/12

@joshmoore
Copy link
Member

see also: #104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants