-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize the medium
algorithm
#223
Conversation
Benchmark 1 (31 runs): ./compress-baseline 3 rs silesia-small.tar measurement mean ± σ min … max outliers delta wall_time 164ms ± 3.09ms 161ms … 178ms 2 ( 6%) 0% peak_rss 24.7MB ± 52.6KB 24.6MB … 24.8MB 6 (19%) 0% cpu_cycles 666M ± 12.7M 657M … 725M 2 ( 6%) 0% instructions 1.71G ± 278 1.71G … 1.71G 0 ( 0%) 0% cache_references 43.7M ± 474K 43.1M … 45.1M 0 ( 0%) 0% cache_misses 1.17M ± 276K 862K … 2.25M 1 ( 3%) 0% branch_misses 7.78M ± 8.65K 7.77M … 7.81M 1 ( 3%) 0% Benchmark 2 (34 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar measurement mean ± σ min … max outliers delta wall_time 150ms ± 4.69ms 148ms … 175ms 2 ( 6%) ⚡- 8.1% ± 1.2% peak_rss 24.7MB ± 65.5KB 24.6MB … 24.8MB 0 ( 0%) - 0.2% ± 0.1% cpu_cycles 650M ± 19.1M 640M … 748M 2 ( 6%) ⚡- 2.3% ± 1.2% instructions 1.66G ± 298 1.66G … 1.66G 0 ( 0%) ⚡- 2.9% ± 0.0% cache_references 43.8M ± 451K 43.0M … 44.9M 0 ( 0%) + 0.0% ± 0.5% cache_misses 1.09M ± 267K 758K … 1.86M 1 ( 3%) - 6.9% ± 11.5% branch_misses 7.80M ± 12.9K 7.78M … 7.84M 3 ( 9%) + 0.2% ± 0.1%
reduces instruction count, does not yet improve performance
Benchmark 2 (36 runs): target/release/examples/blogpost-compress 3 rs silesia-small.tar measurement mean ± σ min … max outliers delta wall_time 141ms ± 1.98ms 139ms … 151ms 1 ( 3%) ⚡- 3.8% ± 1.2% peak_rss 24.7MB ± 71.1KB 24.5MB … 24.8MB 0 ( 0%) - 0.1% ± 0.1% cpu_cycles 626M ± 7.96M 620M … 667M 1 ( 3%) ⚡- 3.7% ± 1.0% instructions 1.60G ± 308 1.60G … 1.60G 0 ( 0%) ⚡- 3.7% ± 0.0% cache_references 43.9M ± 494K 43.2M … 45.0M 0 ( 0%) + 0.3% ± 0.6% cache_misses 1.07M ± 259K 769K … 1.85M 1 ( 3%) + 5.8% ± 13.3% branch_misses 7.79M ± 4.52K 7.78M … 7.80M 2 ( 6%) - 0.1% ± 0.0%
Codecov ReportAll modified and coverable lines are covered by tests ✅
|
filled: 0, | ||
initialized: 0, | ||
}) | ||
buf.fill(MaybeUninit::new(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocators often have an option to directly allocate pre-zeroed memory. How hard would it be to make use of this when not being passed a custom allocator to use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's possible, but a bit cursed maybe? you'd have to check that the allocator is one you know, and can then special-case to use calloc
or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you can add a field to Allocator
and keep it either None or a function that forwards to the regular alloc function in the code that accepts an allocator from the C side and turns it into an Allocator
instance. This way you did use the zeroed allocator method if the C side didn't specify an allocator to use.
Opened #224 against this PR to fix CI. |
much better than before
but still a ways to go
but, most of these changes are actually beneficial for all compression levels, so at the higher levels we're doing really well