Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize the medium algorithm #223

Merged
merged 8 commits into from
Oct 14, 2024
Merged

optimize the medium algorithm #223

merged 8 commits into from
Oct 14, 2024

Conversation

folkertdev
Copy link
Collaborator

much better than before

Benchmark 1 (33 runs): ./compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           155ms ± 3.30ms     150ms …  164ms          1 ( 3%)        0%
  peak_rss           24.7MB ± 67.7KB    24.5MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          671M  ± 12.2M      657M  …  704M           1 ( 3%)        0%
  instructions       1.71G  ±  238      1.71G  … 1.71G           0 ( 0%)        0%
  cache_references   43.8M  ±  552K     42.9M  … 45.1M           2 ( 6%)        0%
  cache_misses       1.16M  ±  300K      787K  … 1.99M           1 ( 3%)        0%
  branch_misses      7.79M  ± 9.15K     7.78M  … 7.81M           0 ( 0%)        0%
Benchmark 2 (36 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           140ms ± 1.83ms     138ms …  146ms          2 ( 6%)        ⚡-  9.8% ±  0.8%
  peak_rss           24.7MB ± 76.7KB    24.5MB … 24.8MB          0 ( 0%)          +  0.0% ±  0.1%
  cpu_cycles          616M  ± 5.98M      609M  …  636M           2 ( 6%)        ⚡-  8.1% ±  0.7%
  instructions       1.53G  ±  264      1.53G  … 1.53G           1 ( 3%)        ⚡- 10.6% ±  0.0%
  cache_references   43.9M  ±  524K     43.2M  … 45.3M           2 ( 6%)          +  0.3% ±  0.6%
  cache_misses       1.02M  ±  213K      744K  … 1.79M           1 ( 3%)        ⚡- 12.7% ± 10.7%
  branch_misses      7.79M  ± 5.34K     7.78M  … 7.80M           2 ( 6%)          +  0.0% ±  0.0%

but still a ways to go

Benchmark 1 (38 runs): target/release/examples/blogpost-compress 3 ng silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           131ms ± 1.58ms     129ms …  138ms          2 ( 5%)        0%
  peak_rss           24.7MB ± 58.5KB    24.6MB … 24.8MB          0 ( 0%)        0%
  cpu_cycles          575M  ± 6.40M      570M  …  601M           2 ( 5%)        0%
  instructions       1.30G  ±  239      1.30G  … 1.30G           0 ( 0%)        0%
  cache_references   41.2M  ±  513K     40.3M  … 42.5M           0 ( 0%)        0%
  cache_misses       1.31M  ±  314K      926K  … 2.50M           2 ( 5%)        0%
  branch_misses      7.70M  ± 5.63K     7.69M  … 7.72M           1 ( 3%)        0%
Benchmark 2 (36 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           140ms ± 1.42ms     138ms …  145ms          4 (11%)        💩+  6.9% ±  0.5%
  peak_rss           24.7MB ± 62.7KB    24.6MB … 24.8MB          0 ( 0%)          -  0.2% ±  0.1%
  cpu_cycles          617M  ± 5.97M      610M  …  637M           1 ( 3%)        💩+  7.1% ±  0.5%
  instructions       1.53G  ±  561      1.53G  … 1.53G           3 ( 8%)        💩+ 17.2% ±  0.0%
  cache_references   44.1M  ±  481K     43.3M  … 45.1M           0 ( 0%)        💩+  7.1% ±  0.6%
  cache_misses       1.07M  ±  212K      790K  … 1.54M           0 ( 0%)        ⚡- 18.3% ±  9.6%
  branch_misses      7.79M  ± 7.97K     7.78M  … 7.82M           3 ( 8%)        💩+  1.2% ±  0.0%

but, most of these changes are actually beneficial for all compression levels, so at the higher levels we're doing really well

Benchmark 2 (24 runs): target/release/examples/blogpost-compress 6 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           215ms ± 3.67ms     211ms …  227ms          2 ( 8%)        ⚡-  2.8% ±  0.9%
  peak_rss           24.5MB ±  116KB    24.2MB … 24.6MB          0 ( 0%)          +  0.0% ±  0.2%
  cpu_cycles          963M  ± 14.1M      949M  … 1.01G           1 ( 4%)        ⚡-  2.5% ±  0.7%
  instructions       1.93G  ±  364      1.93G  … 1.93G           2 ( 8%)        💩+ 17.7% ±  0.0%
  cache_references    105M  ± 1.10M      104M  …  108M           0 ( 0%)        💩+  3.3% ±  0.6%
  cache_misses       2.01M  ±  617K     1.36M  … 3.49M           0 ( 0%)          -  9.4% ± 13.8%
  branch_misses      9.25M  ± 9.71K     9.24M  … 9.27M           0 ( 0%)        💩+  4.1% ±  0.1%

Benchmark 2 (12 runs): target/release/examples/blogpost-compress 9 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           419ms ± 4.96ms     415ms …  428ms          0 ( 0%)        ⚡-  4.1% ±  0.7%
  peak_rss           24.4MB ± 81.5KB    24.2MB … 24.5MB          0 ( 0%)          -  0.1% ±  0.3%
  cpu_cycles         1.90G  ± 12.5M     1.89G  … 1.93G           1 ( 8%)        ⚡-  4.5% ±  0.4%
  instructions       3.18G  ±  398      3.18G  … 3.18G           0 ( 0%)        💩+ 12.7% ±  0.0%
  cache_references    195M  ±  955K      194M  …  198M           0 ( 0%)          +  1.1% ±  0.4%
  cache_misses       2.91M  ±  799K     1.80M  … 4.58M           0 ( 0%)          -  8.6% ± 18.4%
  branch_misses      19.1M  ± 48.8K     19.0M  … 19.2M           3 (25%)        ⚡-  8.3% ±  0.1%

Benchmark 1 (31 runs): ./compress-baseline 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           164ms ± 3.09ms     161ms …  178ms          2 ( 6%)        0%
  peak_rss           24.7MB ± 52.6KB    24.6MB … 24.8MB          6 (19%)        0%
  cpu_cycles          666M  ± 12.7M      657M  …  725M           2 ( 6%)        0%
  instructions       1.71G  ±  278      1.71G  … 1.71G           0 ( 0%)        0%
  cache_references   43.7M  ±  474K     43.1M  … 45.1M           0 ( 0%)        0%
  cache_misses       1.17M  ±  276K      862K  … 2.25M           1 ( 3%)        0%
  branch_misses      7.78M  ± 8.65K     7.77M  … 7.81M           1 ( 3%)        0%
Benchmark 2 (34 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           150ms ± 4.69ms     148ms …  175ms          2 ( 6%)        ⚡-  8.1% ±  1.2%
  peak_rss           24.7MB ± 65.5KB    24.6MB … 24.8MB          0 ( 0%)          -  0.2% ±  0.1%
  cpu_cycles          650M  ± 19.1M      640M  …  748M           2 ( 6%)        ⚡-  2.3% ±  1.2%
  instructions       1.66G  ±  298      1.66G  … 1.66G           0 ( 0%)        ⚡-  2.9% ±  0.0%
  cache_references   43.8M  ±  451K     43.0M  … 44.9M           0 ( 0%)          +  0.0% ±  0.5%
  cache_misses       1.09M  ±  267K      758K  … 1.86M           1 ( 3%)          -  6.9% ± 11.5%
  branch_misses      7.80M  ± 12.9K     7.78M  … 7.84M           3 ( 9%)          +  0.2% ±  0.1%
reduces instruction count, does not yet improve performance
Benchmark 2 (36 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           141ms ± 1.98ms     139ms …  151ms          1 ( 3%)        ⚡-  3.8% ±  1.2%
  peak_rss           24.7MB ± 71.1KB    24.5MB … 24.8MB          0 ( 0%)          -  0.1% ±  0.1%
  cpu_cycles          626M  ± 7.96M      620M  …  667M           1 ( 3%)        ⚡-  3.7% ±  1.0%
  instructions       1.60G  ±  308      1.60G  … 1.60G           0 ( 0%)        ⚡-  3.7% ±  0.0%
  cache_references   43.9M  ±  494K     43.2M  … 45.0M           0 ( 0%)          +  0.3% ±  0.6%
  cache_misses       1.07M  ±  259K      769K  … 1.85M           1 ( 3%)          +  5.8% ± 13.3%
  branch_misses      7.79M  ± 4.52K     7.78M  … 7.80M           2 ( 6%)          -  0.1% ±  0.0%
Copy link

codecov bot commented Oct 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Files with missing lines Coverage Δ
zlib-rs/src/crc32.rs 96.02% <ø> (-0.87%) ⬇️
zlib-rs/src/deflate.rs 96.66% <100.00%> (+<0.01%) ⬆️
zlib-rs/src/deflate/algorithm/medium.rs 94.01% <100.00%> (-0.11%) ⬇️
zlib-rs/src/read_buf.rs 90.47% <100.00%> (+6.83%) ⬆️

... and 3 files with indirect coverage changes

filled: 0,
initialized: 0,
})
buf.fill(MaybeUninit::new(0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allocators often have an option to directly allocate pre-zeroed memory. How hard would it be to make use of this when not being passed a custom allocator to use?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's possible, but a bit cursed maybe? you'd have to check that the allocator is one you know, and can then special-case to use calloc or similar.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe you can add a field to Allocator and keep it either None or a function that forwards to the regular alloc function in the code that accepts an allocator from the C side and turns it into an Allocator instance. This way you did use the zeroed allocator method if the C side didn't specify an allocator to use.

@bjorn3
Copy link
Collaborator

bjorn3 commented Oct 14, 2024

Opened #224 against this PR to fix CI.

@folkertdev folkertdev merged commit 0ae564c into main Oct 14, 2024
18 checks passed
@folkertdev folkertdev deleted the medium-algorithm-optimize branch October 14, 2024 08:30
@bjorn3 bjorn3 mentioned this pull request Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants