-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loop plus match #248
Merged
Merged
Loop plus match #248
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Paranoia: If self.length is miscalculated to be past {extra,name,comm}_max, it could result in us sending a pointer out of bounds when we do `ptr.add(self.length)`. Usually the number of bytes would be zero, but sending a pointer OOB with `ptr.add()` is automatic UB even if we don't actually count any bytes. We do have a guardrail against underflowing subtraction against self.length, but we should check and panic instead of saturating. Better safe than sorry.
Add a safety comment around this unchecked unreachable statement, and convert the integer literals to binary literals to make it more obvious that the match statement is matching against bits.
llvm is able to optimize the branch away anyway https://godbolt.org/z/x9srqoMjj so we can just have less unsafe code
this does not yet cover the macros
Benchmark 2 (66 runs): target/release/examples/blogpost-uncompress rs-chunked 4 silesia-small.tar.gz measurement mean ± σ min … max outliers delta wall_time 76.8ms ± 2.42ms 75.6ms … 95.5ms 3 ( 5%) ⚡- 3.9% ± 1.1% peak_rss 24.1MB ± 73.9KB 23.9MB … 24.1MB 0 ( 0%) - 0.0% ± 0.1% cpu_cycles 308M ± 9.07M 305M … 378M 5 ( 8%) ⚡- 4.0% ± 1.0% instructions 941M ± 236 941M … 941M 0 ( 0%) ⚡- 2.3% ± 0.0% cache_references 3.08M ± 335K 2.78M … 5.42M 3 ( 5%) 💩+ 4.5% ± 3.2% cache_misses 126K ± 25.3K 87.0K … 263K 2 ( 3%) 💩+ 94.9% ± 13.4% branch_misses 4.08M ± 7.54K 4.07M … 4.13M 3 ( 5%) - 0.2% ± 0.1%
Codecov ReportAttention: Patch coverage is
|
rnijveld
approved these changes
Nov 14, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
switch from tail calls to loop+match
It turns out that even if we ask LLVM nicely, it will not actually make our collection of functions, all with the same signature, tail call one another.
After finding stack overflows with a fuzzer, we dove into the assembly, and found some cases where the stack grew
LLVM wants to centralize the cleanup before the return (many other blocks jump to
LBB109_311
), thereby invalidating a tail call totype_do
.We were not able to get rid of this call without introducing one elsewhere: we just don't currently have the power to tell LLVM what we want it to do.
So, we switch back to loop+match waiting for changes to rust to make a more efficient implementation possible.
Performance-wise, the damage is relatively minimal: we're just slower in cases where we already were slower than C. We are faster in cases where the relevant code is barely touched (in these cases the logic quickly moves into a hot inner loop and just spends most of its time there).