Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with aray shape when using dask_image.imread (and dask.array.image.imread) vs. imageio.imread #239

Closed
habi opened this issue Jun 22, 2021 · 6 comments

Comments

@habi
Copy link

habi commented Jun 22, 2021

I'm loading a big bunch of tomographic data in a preview/analysis notebook.
As we keep scanning samples, the dataframe I put the preview images in gets larger and larger.
I've been using imageio.imread to load the preview images (middle axial slices and MIPs) from disk.
I'd like to switch to dask_image.imread for this, as I'm loading the full datasets with it and generate the preview files from the full stacks loaded like this.

I now saw that loading an image with imageio.imread returns an image with two coordinates (size of the image), while dask_image.imread (and dask.array.image.imread return an image with three coordinates, the first one being 1, the second and third being the size of the image.
I'm very well aware that I can just .squeeze() the array before displaying with matplotlib, but expect that all the imread functions return the same kind of array.

Minimal Complete Verifiable Example:

I've made a gist which shows my issue fully self-contained, it can be found here and can be started in Binder: Binder

It boils down to

imgio = imageio.imread('random.png')
imgdask = dask.array.image.imread('random.png')
imgdaskimg = dask_image.imread.imread('random.png')

returning different shapes.

This is closely related to #229 :)

@habi habi changed the title Issues with shape when using dask_image.imread (and dask.array.image.imread) vs. imageio.imread Issues with aray shape when using dask_image.imread (and dask.array.image.imread) vs. imageio.imread Jun 22, 2021
@GenevieveBuckley
Copy link
Collaborator

Thank you for the report (and binder example!) @habi

My best suggestion is to squeeze the array to remove the singleton dimension(s) if they're causing you problems.

import numpy as np

squeezed_imgdask = np.squeeze(imgdask)
squeezed_imgdask.shape
# (100, 100)

Since this has no effect if no singleton dimensions are present, you would be able to add this generally to your code. Then you'll get the same output, regardless of whether you happen to be using dask or not.

@habi
Copy link
Author

habi commented Jul 6, 2021

My best suggestion is to squeeze the array to remove the singleton dimension(s) if they're causing you problems.

I did squeeze the array in the end, so it's all good :)
I just expected the same return as imageio, maybe that's something to keep in mind for the work in #229.

@GenevieveBuckley
Copy link
Collaborator

That's good to hear, thanks

@habi
Copy link
Author

habi commented Mar 29, 2022

I'm again having an issue with this, with a fresh installation of imageio and dask_image in a new conda environment.
The versions are

dask-image                2021.12.0          pyhd8ed1ab_0    conda-forge
imageio                   2.16.1             pyhcf75d05_0    conda-forge

When I load an image (one of thousands :) with

img_imgio = imageio.imread(filename)
img_dask = dask_image.imread.imread(filename)
print(img_imgio.shape)
print(img_dask.shape)

I get (3072, 3072) for imageio and (1, 3072, 3072, 4) for dask_image.
Is there any way to force dask_image to read 'simple' PNGs as 8bit gray images?

@jakirkham
Copy link
Member

Guessing that we are getting some RGBA or similar uint8 splitting of the last dimension. This can be fixed by viewing it as uint8. It will leave a singleton dimension behind (so (1, 3072, 3072, 1)), but we can use squeeze for both this and the first dimension as Genevieve had suggested above.

img.view(np.uint32).squeeze()

More broadly we are looking at moving over to imageio. Some discussion in issue ( #181 ) about this.

@habi
Copy link
Author

habi commented Mar 31, 2022

Thanks for the comment @jakirkham!

The underlying issue is more that I'm using

for c, sample in Data.iterrows()):
    Reconstructions[c] = dask_image.imread.imread(os.path.join(sample['Folder'], '*rec*.png'))

to lazily load +10000 of images from disk (several samples with each a folder of +1000 reconstructions).

From these I then generate files as necessary (axial views and MIPs), but do not view them directly.
It seems to me that I have to switch everything to the 'pure dask' way mentioned in issue #181 above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants