Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Dask loading of files to Live Viewer backend #2312

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

MikeSullivan7
Copy link
Collaborator

@MikeSullivan7 MikeSullivan7 commented Aug 8, 2024

Issue

Closes #2311

Description

Dask is now used to load in the files in the Live Viewer path and display them as normal. Dask allows us to have a delayed array of all image data in the directory but without loading all of the data into memory. In order to display the images in the Live Viewer, the delayed array pointing to the image data is "computed" as needed but not stored permanently into memory.

This allows us to perform operations on the live data which would require the whole imagestack (mean, spectrum, etc), but without loading and storing the whole stack into memory at once. This PR acts as a proof of principle of the usefulness of Dask in Mantid Imaging, and gives a foundation of the structures needed to make Dask work.

Compatibility has been added for both .tif and .fits files but they are dealt with separately as .fits files are not natively supported by Dask and therefore the delayed arrays and computations have been done manually.

Testing

make check

Acceptance Criteria

  1. Open MI, open the Live Viewer and point to a folder with data, e.g.
    python -m mantidimaging -lv="C:\Users\ddb29996\Documents\MantidImaging Data\Large Dataset\Flower_WhiteBeam\Tomo"
    It would be preferable to do this with a larger dataset to easily see the benefit of using Dask.

  2. Check that the images load as normal and you can move between frames with no errors or appreciable slowdown.

  3. Perform an "Operation" on the whole imagestack. While we do not currently implement these kinds of operations in the Live Viewer yet, you can paste the following code into line 346 of mantidimaging/gui/windows/live_viewer/model.py:

arrmean= dask.array.mean(dask_image_stack.delayed_stack, axis=(1, 2))
import matplotlib.pyplot as plt
plt.plot(arrmean.compute())
plt.show()

This will take the delayed imagestack and calculate a form of spectrum of all images in the Live Viewer folder.
As you open and initialise the Live Viewer, keep an eye on your RAM usage and check that the RAM usage does not increase by the size of the imagestack (this is easier to see with the Flower_WhiteBeam dataset as it is around 9GB).

Check that this calculated spectrum is what you would expect for the dataset, for example, for the Flower_Whitebeam data, you should get this:

image

For the MantidImaging Data\Brass\Corrected_Sample_PH20 data, you should get:

image

Repeat this process with both .tif and .fits datasets to make sure both are functional.

As the nature of how some of the Live Viewer data structures and flows work has been changed, the Live Viewer tests have been altered to reflect this.

Documentation

Will add release note

@MikeSullivan7 MikeSullivan7 added Type: Improvement Type: Feature dependencies Pull requests that update a dependency file Quality: Performance rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) labels Aug 8, 2024
@MikeSullivan7 MikeSullivan7 self-assigned this Aug 8, 2024
@MikeSullivan7 MikeSullivan7 added rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) and removed rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) labels Aug 9, 2024
@coveralls
Copy link

coveralls commented Aug 9, 2024

Coverage Status

coverage: 74.021% (-0.3%) from 74.322%
when pulling 486da42 on dask_live_viewer
into f3c9e9d on main.

@MikeSullivan7 MikeSullivan7 added rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) and removed rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) labels Aug 9, 2024
@MikeSullivan7 MikeSullivan7 removed the rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) label Aug 12, 2024
@MikeSullivan7 MikeSullivan7 added rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) and removed rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) labels Aug 16, 2024
@MikeSullivan7 MikeSullivan7 added rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) and removed rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) labels Aug 16, 2024
@MikeSullivan7
Copy link
Collaborator Author

Some Benchmarks:

Running python -m mantidimaging -lv="C:\Users\ddb29996\Documents\MantidImaging Data\mantidimaging-data-main\mantidimaging-data-main\ISIS\IMAT\IMAT00010675\Tomo"

With the Delayed Stack not being created with create_delayed_array=False, it takes 0.178 seconds to run All _handle_directory_change in the Live Viewer Model.
create_delayed_array=True takes 9.863 seconds

I will also check the timings when simulating live data but it would be useful to append to the existing Delayed Stack rather than creating and replacing already created Image_Data objects with their associated delayed arrays.

@MikeSullivan7
Copy link
Collaborator Author

Using the code

        if len(images) % 50 == 0:
            with ExecutionProfiler(msg=f"create delayed array and compute mean for {len(images)} images"):
                dask_image_stack = DaskImageDataStack(images, create_delayed_array=self.create_delayed_array)
                if dask_image_stack.delayed_stack is not None:
                    arrmean = dask.array.mean(dask_image_stack.delayed_stack, axis=(1, 2))
                    print(arrmean.compute())
        else:
            dask_image_stack = DaskImageDataStack(images, create_delayed_array=self.create_delayed_array)

We get:

image

@MikeSullivan7
Copy link
Collaborator Author

image

Ive benchmarked with the smaller and larger datasets and attempted to rechunk the Dask Array away from its default, e.g. chunksize = (1, 512, 512) for a (512, 512) dataset. Setting dask.array.rechunk('auto') makes things slower due to the way it accesses the chunks when we access each data slice to compute the mean.

@MikeSullivan7
Copy link
Collaborator Author

Update to Benchmarking:

image

Ive found that we can speed up the calculation of the mean while the LV is running by taking the mean of each image coming in with the stored running mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file Quality: Performance rebuild_docker 🐋 Add if you want to force rebuild docker images (ONLY IF MERGING INTO MAIN) Type: Feature Type: Improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use Dask to load image files in the Live Viewer
2 participants