opt9: batch vle processing #228

msm-cert · 2024-10-14T12:54:11Z

Optimize ANDing uncompressed and compressed sorted runs:

// Performance critical method. As of the current version, in some tests,
// more than half of the time is spent ANDing decompressed and compressed runs.
// We expect the compressed set to be large, and sequence to be short.

Unfortunately, the fast-case code is significantly ugly, but the speedup is worth it (AND gets almost 200% faster, and this transfers to almost 10% faster total time, depending on the workload [1]).

On the other hand, experiments with AVX2 were slightly disappointing - they give another 5% performance, but are even uglier and also make the code dependent on the CPU flags (so we probably need a fallback anyway). Most people on the internet manage to get a larger speedup from AVX, so this may be a skill issue. Anyway after this PR "and" stops being a bottleneck again.

Similarly, tests with SSE were maybe a bit faster (within a test margin of error) but this made code more complicated). I still believe it's possible to make this significantly faster with enogh SIMD magic, but that's good enough I guess.

[1] If you wonder why the numbers don't add up - speeding up set operations also inexplicably slows down disk reads. This is probably because of disk prefetching going on under the hood.

msm-code added 5 commits October 14, 2024 14:50

opt9: batch vle processing

377e3cb

Refactor the code, hopefully

481661a

Format with clang-format

0bc7d9d

Fix version number

c010742

A few comments

cacf13c

msm-cert merged commit c2ffdea into master Oct 16, 2024
5 checks passed

msm-cert deleted the perf/opt9 branch October 16, 2024 13:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt9: batch vle processing #228

opt9: batch vle processing #228

msm-cert commented Oct 14, 2024 •

edited

Loading

opt9: batch vle processing #228

opt9: batch vle processing #228

Conversation

msm-cert commented Oct 14, 2024 • edited Loading

msm-cert commented Oct 14, 2024 •

edited

Loading