Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use insertion sort for small sort_and_dedup_dst_buf inputs #459

Merged
merged 2 commits into from
Jan 9, 2024

Conversation

silentbicycle
Copy link
Collaborator

As a follow up to #456 -- kate mentioned that examples/words still had the same outliers (#456 (comment)). As hashing dropped lower in the profile, spending a disproportionate amount of time in sort_and_dedup_dst_buf showed up. This makes sense, because the outliers in the words output come from blab's successive seeds generating specific outputs (such as combining 855 instances of the regex GyADewtfILV53SSPua4IKbEZbug6BWTJ8o22K6ydeXs0NWsMADsGyADewtfILV53SSPua4IKbEZbug6BWTJ8o22K6ydeXs0NWsMA...) that are pathological knots for determinisation to untangle, not just a gradually increasing input size.

Currently, sort_and_dedup_dst_buf handles its input with a couple strategies:

  • a single state: return immediately, no change
  • only a single value (duplicated): return the one value, done
  • an array where max-min is small fairly small: use a stack-allocated bit set to simultaneously sort and dedup, to avoid qsort's indirection overhead when possible
  • otherwise: qsort

The outlier appeared to be due to from frequently calling it with inputs which only had a few values, but a large delta between the min and max values, so it was effectively zeroing a couple KB on the stack, setting just a few bits, and then sweeping over it again to write out an array of the set bits -- when the input array is fairly small (< = SMALL_INPUT_LIMIT), it's faster to append ascending values into a temporary buffer (so already sorted input just copies over) and then shift it forward as necessary to plug in the remaining values. In other words, use insertion sort when the entire data set fits within a few cache lines. Iit often does.

When using a words.sh output test data set kate sent me for benchmarking, this brings the total runtime for words -dt from about 11.7 seconds to about 8.4 seconds on my laptop.

I also added a check for whether the input is already sorted and deduplicated, so the function doesn't need to do anything. During benchmarking that case was only reached a small percent of the time, but it's very cheap to also note that while sweeping through to find the min and max values.

I'm still experimenting with the limit, but it's common for
sort_and_dedup_dst_buf to be called with a fairly small input array. The
bitset approach's cost is proportional to the max - min value delta, so
when there's only a few entries it may be faster to do insertion sort
instead -- 16 entries will probably fit in a single cache line.

Changing SMALL_INPUT_LIMIT changes cutoff, commenting it out disables
that approach outright. Setting SMALL_INPUT_CHECK to 1 enables a
pass at the end that verifies the array is actually sorted and unique.

Also, check upfront whether the input array is already sorted and
unique. Experiments so far suggest that this doesn't pay off very often,
but it's very cheap since we're already sweeping over the input to the
min and max values.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
I benchmarked with a range of values 8 to 4096, somewhere around
32 - 256 worked particularly well, and it started to become slower
beyond that, so set it to <32 for now.
@katef
Copy link
Owner

katef commented Jan 9, 2024

As of this PR:

; ./words.sh 40000 100 | guff -b
    x: [0 - 19]    y: [0 - 3967]
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠂
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠐⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠐⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠂⠀⠂⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠄⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⠀⠀⠀⠀⠀⠀⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⡇⠀⠀⢀⠀⠠⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣇⣀⣄⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀⣀

@katef katef merged commit 259d507 into main Jan 9, 2024
@katef katef deleted the sv/use-insertion-sort-for-small-sort-and-dedup-inputs branch January 9, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants