Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On an HPC, saving to zarr is much faster than saving to binary #3630

Open
chrishalcrow opened this issue Jan 20, 2025 · 5 comments
Open

On an HPC, saving to zarr is much faster than saving to binary #3630

chrishalcrow opened this issue Jan 20, 2025 · 5 comments
Labels
bug Something isn't working performance Performance issues/improvements

Comments

@chrishalcrow
Copy link
Collaborator

Hello,

I think that saving to binary isn't parralellising when I run it on the Uni of Edinburgh HPC, Eddie. Here's the code:

import spikeinterface.full as si

# there's some code to get the recording paths, and `get_recording_from` returns a openephys recording object
recording = si.concatenate_recordings([get_recording_from(rec_path) for rec_path in rec_paths])
recording_filtered = si.common_reference(si.bandpass_filter(recording, freq_min=300, freq_max=6000))

si.set_global_job_kwargs(n_jobs=8)
recording_filtered.save_to_zarr(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_zarr")
recording_filtered.save_to_folder(folder="/exports/eddie/scratch/chalcrow/harry_project/temp_binary")

This takes in an ~hour long recording, and then saves it twice. The output is:

write_zarr_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_zarr_recording: 100%|██████████| 4203/4203 [10:30<00:00,  6.67it/s]

write_binary_recording
n_jobs=8 - samples_per_chunk=30,000 - chunk_memory=43.95 MiB - total_memory=351.56 MiB - chunk_duration=1.00s
write_binary_recording: 100%|██████████| 4203/4203 [1:21:15<00:00,  1.16s/it]

So write_binary_recording takes 8x as long so it looks like it's not been parallelised. But weirdly, the code reports that it's using n_jobs=8 for both cases. Passing n_jobs directly to save_to_folder doesn't seem to help.

On my personal computer, there is no real difference in the timings between these methods.

Any ideas? Are the parallelisation scehems difference for the different file formats?

@chrishalcrow chrishalcrow added bug Something isn't working performance Performance issues/improvements labels Jan 20, 2025
@zm711
Copy link
Collaborator

zm711 commented Jan 20, 2025

Do you see utilization of the cores change depending on n_jobs on Eddie? Like an htop or something?

@samuelgarcia
Copy link
Member

The writing is 6x if I am reading correctly.

My guess is that the writting on a NFS drive is slow and then having a less to write could explain the diff.

Could you make some basic test of writting speed for this drive ?

Also what is the ratio between the 2 folders ?
My guess is that the compress ratio is 2.5 between and 3.5.
This do not explain the 6x ratio speed.

Maybe locking to write in the same file in binary case cost a lot for this filesystem.
If so could you use the threading implementation to check this ?

@chrishalcrow
Copy link
Collaborator Author

The zarr took 10.5mins and binary took 1hr21m15s=81.25mins. So the binary took 7.74 times longer.

I can track the wallclock time and cpu time on the HPC. I am running both save methods on one script. At the end of the zarr saving, we have

wallclock=00:10:13, cpu=01:16:51

meaning that in 10mins of human time, we used 76 mins of CPU time, which makes sense for a 8x parallelised piece of code. From now on, the code is saving the binary file. About ten minutes later, we have

wallclock=00:22:18, cpu=01:33:19

Meaning that only one cpu has been used in the last 10 mins.

So seems to be a parallelisation thing, for sure! Very confusing.

@zm711
Copy link
Collaborator

zm711 commented Jan 24, 2025

Is it possible it's not the last ten minutes but the first?

        with open(file_path, "wb+") as file:
            # The previous implementation `file.truncate(file_size_bytes)` was slow on Windows (#3408)
            file.seek(file_size_bytes - 1)
            file.write(b"\0")


        assert Path(file_path).is_file()

We create the empty file first in order to write it for binary. And this is not using the multiprocessing. Just a stab in the dark. Do you want to time how long it takes to run just this code on your HPC to see if this might contribute?
and the link to the function in the file:

with open(file_path, "wb+") as file:
# The previous implementation `file.truncate(file_size_bytes)` was slow on Windows (#3408)
file.seek(file_size_bytes - 1)
file.write(b"\0")
assert Path(file_path).is_file()

@samuelgarcia
Copy link
Member

@zm711 : really good point.
This lazy file creating could depend on the file system and on HPC sometimes they have more complex file system that do not behave like ext4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working performance Performance issues/improvements
Projects
None yet
Development

No branches or pull requests

3 participants