Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up dataset tests, compressed writes to zarr3 arrays #963

Merged
merged 8 commits into from
Nov 8, 2023

Conversation

fm3
Copy link
Member

@fm3 fm3 commented Nov 7, 2023

Description:

  • Sometimes the python tests (group 2) timed out after 30 minutes, other times they took about 10 minutes.
  • I’m still a bit unsure what caused this, but since the tests that got stuck were downsampling tests, I dug into those a bit
  • Turns out they do compressed writes, which was very slow for sharded zarr3 datasets. It appears they read data with the full shard size, then insert the written bit, then write everything back.
  • During this reading of the full shard, I found an unnecessary numpy broadcast, for which this PR creates a shortcut
  • I also reduced the shard size in the tests, further speeding them up slightly. I could no longer reproduce the test timeouts with this, but I can’t be sure that something else was causing the huge variation
  • The PR also adds the pytest-timestamper plugin, which adds timestamps to pytest’s output
  • Small writes to compressed zarr arrays are still way slower than for WKW, which is why I created an additional issue Small writes to compressed sharded zarr3 mags are slow #964

Some stats: slowest tests in test_dataset.py (on local laptop with SSD):

Before (master):

took 12m 54s

163.85s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr3-output_path3]
158.27s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr3-output_path4]
84.19s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr3-output_path4]
82.51s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr3-output_path3]
22.62s call     tests/dataset/test_dataset.py::test_chunking_wk[zarr3-output_path4]
11.28s call     tests/dataset/test_dataset.py::test_for_zipped_chunks[zarr]
10.93s call     tests/dataset/test_dataset.py::test_compression[zarr-output_path2]
7.25s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr-output_path2]
7.14s call     tests/dataset/test_dataset.py::test_compression[zarr3-output_path4]
6.86s call     tests/dataset/test_dataset.py::test_guided_downsampling[wkw-output_path0]
5.40s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr3-output_path4]
5.28s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr-output_path1]
5.00s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr-output_path2]
4.76s call     tests/dataset/test_dataset.py::test_downsampling[zarr-output_path2]
4.74s call     tests/dataset/test_dataset.py::test_downsampling[zarr3-output_path4]
4.58s call     tests/dataset/test_dataset.py::test_aligned_downsampling[wkw-output_path0]
4.41s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr-output_path2]
4.40s call     tests/dataset/test_dataset.py::test_chunking_wk[zarr-output_path2]
4.38s call     tests/dataset/test_dataset.py::test_writing_subset_of_chunked_compressed_data[zarr3-output_path4]
4.19s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr3-output_path3]
4.01s call     tests/dataset/test_dataset.py::test_downsampling[zarr3-output_path3]
3.99s call     tests/dataset/test_dataset.py::test_chunking_wkw_advanced[zarr3]
3.92s call     tests/dataset/test_dataset.py::test_for_zipped_chunks[wkw]
3.71s call     tests/dataset/test_dataset.py::test_chunked_compressed_write
3.61s call     tests/dataset/test_dataset.py::test_chunking_wk[zarr3-output_path3]
3.60s call     tests/dataset/test_dataset.py::test_downsampling[zarr-output_path1]
3.59s call     tests/dataset/test_dataset.py::test_views_are_equal[wkw-output_path0]
3.42s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr-output_path1]
3.23s call     tests/dataset/test_dataset.py::test_chunking_wkw_advanced[wkw]
3.23s call     tests/dataset/test_dataset.py::test_compression[zarr3-output_path3]

After:

took 5m 35s

23.77s call     tests/dataset/test_dataset.py::test_chunking_wk[zarr3-output_path4]
15.78s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr3-output_path4]
14.51s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr3-output_path3]
10.90s call     tests/dataset/test_dataset.py::test_compression[zarr-output_path2]
10.57s call     tests/dataset/test_dataset.py::test_for_zipped_chunks[zarr]
9.62s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr3-output_path4]
8.35s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr3-output_path3]
7.81s call     tests/dataset/test_dataset.py::test_compression[zarr3-output_path4]
7.04s call     tests/dataset/test_dataset.py::test_guided_downsampling[wkw-output_path0]
6.92s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr-output_path2]
5.56s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr3-output_path4]
5.04s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr-output_path2]
4.90s call     tests/dataset/test_dataset.py::test_guided_downsampling[zarr-output_path1]
4.84s call     tests/dataset/test_dataset.py::test_downsampling[zarr-output_path2]
4.77s call     tests/dataset/test_dataset.py::test_downsampling[zarr3-output_path4]
4.57s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr-output_path2]
4.46s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr3-output_path3]
4.28s call     tests/dataset/test_dataset.py::test_writing_subset_of_chunked_compressed_data[zarr3-output_path4]
3.95s call     tests/dataset/test_dataset.py::test_aligned_downsampling[wkw-output_path0]
3.94s call     tests/dataset/test_dataset.py::test_downsampling[zarr3-output_path3]
3.89s call     tests/dataset/test_dataset.py::test_chunking_wk[zarr3-output_path3]
3.87s call     tests/dataset/test_dataset.py::test_aligned_downsampling[zarr-output_path1]
3.73s call     tests/dataset/test_dataset.py::test_chunked_compressed_write
3.69s call     tests/dataset/test_dataset.py::test_for_zipped_chunks[wkw]
3.50s call     tests/dataset/test_dataset.py::test_views_are_equal[zarr-output_path1]
3.44s call     tests/dataset/test_dataset.py::test_chunking_wk[zarr-output_path2]
3.43s call     tests/dataset/test_dataset.py::test_chunking_wkw_advanced[zarr3]
3.42s call     tests/dataset/test_dataset.py::test_chunking_wkw_advanced[wkw]
3.41s call     tests/dataset/test_dataset.py::test_writing_subset_of_chunked_compressed_data[zarr3-output_path3]
3.38s call     tests/dataset/test_dataset.py::test_downsampling[zarr-output_path1]

Todos:

Make sure to delete unnecessary points or to check all before merging:

  • Updated Changelog

@fm3 fm3 changed the title Add pytest-timestamper to debug slow tests Speed up dataset tests, compressed writes to zarr3 arrays Nov 7, 2023
@fm3 fm3 self-assigned this Nov 8, 2023
@fm3 fm3 marked this pull request as ready for review November 8, 2023 09:08
@fm3 fm3 requested a review from normanrz November 8, 2023 09:48
Copy link
Member

@normanrz normanrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I think writing zarr3 needs another closer look also wrt to safe concurrent writes.

@fm3 fm3 merged commit c09101f into master Nov 8, 2023
@fm3 fm3 deleted the debug-slow-tests branch November 8, 2023 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants