Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loop plus match #248

Merged
merged 10 commits into from
Nov 14, 2024
Merged

Loop plus match #248

merged 10 commits into from
Nov 14, 2024

Conversation

folkertdev
Copy link
Collaborator

@folkertdev folkertdev commented Nov 14, 2024

switch from tail calls to loop+match

It turns out that even if we ask LLVM nicely, it will not actually make our collection of functions, all with the same signature, tail call one another.

After finding stack overflows with a fuzzer, we dove into the assembly, and found some cases where the stack grew

.LBB109_326:
    mov rdi, rbx
    call zlib_rs::inflate::State::type_do
    jmp .LBB109_311

.LBB109_311:
    lea rsp, [rbp - 40]
    pop rbx
    pop r12
    pop r13
    pop r14
    pop r15
    pop rbp
    .cfi_def_cfa rsp, 8
    ret

LLVM wants to centralize the cleanup before the return (many other blocks jump to LBB109_311), thereby invalidating a tail call to type_do.

We were not able to get rid of this call without introducing one elsewhere: we just don't currently have the power to tell LLVM what we want it to do.

So, we switch back to loop+match waiting for changes to rust to make a more efficient implementation possible.

Performance-wise, the damage is relatively minimal: we're just slower in cases where we already were slower than C. We are faster in cases where the relevant code is barely touched (in these cases the logic quickly moves into a hot inner loop and just spends most of its time there).

folkertdev and others added 9 commits November 14, 2024 11:48
Paranoia: If self.length is miscalculated to be past {extra,name,comm}_max,
it could result in us sending a pointer out of bounds when we do
`ptr.add(self.length)`. Usually the number of bytes would be zero, but sending
a pointer OOB with `ptr.add()` is automatic UB even if we don't actually count
any bytes.

We do have a guardrail against underflowing subtraction against self.length,
but we should check and panic instead of saturating. Better safe than sorry.
Add a safety comment around this unchecked unreachable statement, and convert
the integer literals to binary literals to make it more obvious that the match
statement is matching against bits.
llvm is able to optimize the branch away anyway https://godbolt.org/z/x9srqoMjj so we can just have less unsafe code
this does not yet cover the macros
Benchmark 2 (66 runs): target/release/examples/blogpost-uncompress rs-chunked 4 silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          76.8ms ± 2.42ms    75.6ms … 95.5ms          3 ( 5%)        ⚡-  3.9% ±  1.1%
  peak_rss           24.1MB ± 73.9KB    23.9MB … 24.1MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          308M  ± 9.07M      305M  …  378M           5 ( 8%)        ⚡-  4.0% ±  1.0%
  instructions        941M  ±  236       941M  …  941M           0 ( 0%)        ⚡-  2.3% ±  0.0%
  cache_references   3.08M  ±  335K     2.78M  … 5.42M           3 ( 5%)        💩+  4.5% ±  3.2%
  cache_misses        126K  ± 25.3K     87.0K  …  263K           2 ( 3%)        💩+ 94.9% ± 13.4%
  branch_misses      4.08M  ± 7.54K     4.07M  … 4.13M           3 ( 5%)          -  0.2% ±  0.1%
Copy link

codecov bot commented Nov 14, 2024

Codecov Report

Attention: Patch coverage is 95.93220% with 36 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
zlib-rs/src/inflate.rs 96.00% 28 Missing ⚠️
test-libz-rs-sys/src/inflate.rs 95.65% 8 Missing ⚠️
Files with missing lines Coverage Δ
test-libz-rs-sys/src/inflate.rs 97.27% <95.65%> (-0.18%) ⬇️
zlib-rs/src/inflate.rs 95.43% <96.00%> (+0.36%) ⬆️

... and 2 files with indirect coverage changes

@rnijveld rnijveld merged commit 830f886 into main Nov 14, 2024
18 checks passed
@rnijveld rnijveld deleted the loop-plus-match branch November 14, 2024 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants