Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt9: batch vle processing #228

Merged
merged 5 commits into from
Oct 16, 2024
Merged

opt9: batch vle processing #228

merged 5 commits into from
Oct 16, 2024

Conversation

msm-cert
Copy link
Member

@msm-cert msm-cert commented Oct 14, 2024

Optimize ANDing uncompressed and compressed sorted runs:

// Performance critical method. As of the current version, in some tests,
// more than half of the time is spent ANDing decompressed and compressed runs.
// We expect the compressed set to be large, and sequence to be short.

Unfortunately, the fast-case code is significantly ugly, but the speedup is worth it (AND gets almost 200% faster, and this transfers to almost 10% faster total time, depending on the workload [1]).

On the other hand, experiments with AVX2 were slightly disappointing - they give another 5% performance, but are even uglier and also make the code dependent on the CPU flags (so we probably need a fallback anyway). Most people on the internet manage to get a larger speedup from AVX, so this may be a skill issue. Anyway after this PR "and" stops being a bottleneck again.

Similarly, tests with SSE were maybe a bit faster (within a test margin of error) but this made code more complicated). I still believe it's possible to make this significantly faster with enogh SIMD magic, but that's good enough I guess.

[1] If you wonder why the numbers don't add up - speeding up set operations also inexplicably slows down disk reads. This is probably because of disk prefetching going on under the hood.

@msm-cert msm-cert merged commit c2ffdea into master Oct 16, 2024
5 checks passed
@msm-cert msm-cert deleted the perf/opt9 branch October 16, 2024 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants