Automatic serialization/deserialization of Numpy arrays #13

jtc42 · 2020-02-07T11:39:11Z

See https://gitlab.com/openflexure/openflexure-microscope-server/-/blob/master/openflexure_microscope/utilities.py#L25

We should introduce into the spec common scientific data types, and sensible ways to (de)serialize them.

Start with Numpy arrays (as we've already done this, see link above).

Discussion: What other scientific data types might be useful?

ChasNelson1990 · 2020-03-20T16:10:36Z

So JSONs are definitely is a good way to deal with dicts and even pd.DataFrame objects but are they the right way to deal with np.ndarrays?

Also, many of our arrays might be more suitable in an xarray object? In which case using hdf5 as a base type makes sense. Xarrays reccomend netCDF (based on hdf5) http://xarray.pydata.org/en/stable/io.html.

Pandas has hdf5 support (it might be an additional package, can't remember) and there's https://www.h5py.org/ for more general use.

Worth considering compression though, which is not covered by most existing python hdf5 packages (I think).

jtc42 · 2020-03-20T16:53:13Z

So the logic here is that, because of the structure of the API, and especially the websocket-based data event stuff, we should be able to include data within a JSON object.

This is similar to how things like OME XML works, in that they have XML metadata which contains a binary blob.

I'm more than happy to have suggestions on encoding formats, but whatever we choose, it would be nice if it could be sensible embedded with a JSON object.

Np.ndarrays are nice because they're just C-contiguous arrays. The encoding we use in the OFM software gives you type information, array dimensions, and a base64 encoded binary blob of the array, which means it can be used outside of Python if you want.

That said, I'm not married to the idea.

We could move to a model of JSON linking to a separate binary file, but especially for small data sets, theres something appealing about the data and its metadata all being contained within a single object.

As usual, open to suggestions.

ChasNelson1990 · 2020-03-20T17:15:14Z

Hmm... that makes sense I guess... a quick google on why not use hdf5 came up with- https://cyrille.rossant.net/moving-away-hdf5/
This, and other links, do seem to suggest storing big blobs separately with JSON holding the metadata and so, for small things, just using JSON.

For reference, this matches the xarray system to I believe, the actual array is just a numpy array and the xarray object basically wraps that with a 'metadata' layer, i.e. like column names in a pd.Series.

jtc42 · 2020-03-20T17:21:26Z

Oh thats super useful thanks!

So it might be that we just add in xarray support (it does seem really sensible) which would serialise like:

Ndarray:

{
        "@type": "ndarray",
        "dtype": << data type >>,
        "shape": << array shape >>,
        "base64": << base 64 encoded blob >>
}

Xarray:

{
        "@type": "xarray",
        "dtype": << data type >>,
        "shape": << array dims >>,
        "coords": << xarray coords dict >>,
        "attrs": << xarray attrs dict >>,
        "base64": << base 64 encoded blob >>
}

Then in cases where the data is being stored separately, we just return a link to the object binary (.npz (numpy), netCDF (xarray)) file.

ChasNelson1990 · 2020-03-20T17:28:47Z

Looks sensible.

glyg · 2021-11-26T10:01:50Z

Hi! For now, is it OK to copy/paste the openflexure serialization code on one's own code base?
For smallish data, it seem sufficient to base64 encode an array directly in the response.

jtc42 · 2021-11-26T10:06:10Z

Hi! For now, is it OK to copy/paste the openflexure serialization code on one's own code base?
For smallish data, it seem sufficient to base64 encode an array directly in the response.

Yeah you should be able to use whatever code you like as long as it's within the GPL license (https://gitlab.com/openflexure/openflexure-microscope-server/-/blob/master/LICENSE)

Fix CI issues

jtc42 added critical On the critical path enhancement New feature or request labels Feb 7, 2020

ChasNelson1990 mentioned this issue Mar 20, 2020

JSON (de)serializer to fall back to something sensible (but not pickle) #14

Closed

glyg mentioned this issue Nov 26, 2021

Additional Thing metadata #16

Open

rwb27 added a commit to rwb27/python-labthings that referenced this issue May 15, 2024

Merge pull request labthings#13 from rwb27/CI-debug

ab02e3e

Fix CI issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic serialization/deserialization of Numpy arrays #13

Automatic serialization/deserialization of Numpy arrays #13

jtc42 commented Feb 7, 2020

ChasNelson1990 commented Mar 20, 2020

jtc42 commented Mar 20, 2020

ChasNelson1990 commented Mar 20, 2020

jtc42 commented Mar 20, 2020

ChasNelson1990 commented Mar 20, 2020

glyg commented Nov 26, 2021

jtc42 commented Nov 26, 2021

Automatic serialization/deserialization of Numpy arrays #13

Automatic serialization/deserialization of Numpy arrays #13

Comments

jtc42 commented Feb 7, 2020

ChasNelson1990 commented Mar 20, 2020

jtc42 commented Mar 20, 2020

ChasNelson1990 commented Mar 20, 2020

jtc42 commented Mar 20, 2020

ChasNelson1990 commented Mar 20, 2020

glyg commented Nov 26, 2021

jtc42 commented Nov 26, 2021