-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPMC 2-stack queue #112
MPMC 2-stack queue #112
Conversation
fe48302
to
cf2beeb
Compare
fa07ee9
to
d090cb7
Compare
e99cf2a
to
e96063f
Compare
2655ec6
to
2578a23
Compare
c461de3
to
2a7e58d
Compare
f7a1c73
to
224fcfe
Compare
224fcfe
to
6592a73
Compare
cf16d30
to
fa499cd
Compare
I agree, that CPU is not supported by its vendor anymore. See https://www.amd.com/en/support/download/drivers.html and https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amd-ucode/README?showmsg=1, notice that lack of microcode updates since 2018: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/log/amd-ucode/microcode_amd_fam15h.bin.asc I'd suggest using at least a Zen1 CPU for AMD, and Skylake for Intel.
The short comparison here doesn't show confidence intervals, but rerunning it shows similar variation on master:
|
Note that the
Yes, the "statistics" in multicore-bench is very rudimentary. The idea has been to move that to the benchmarking service/frontend and just have the benchmarking db store the raw results. This way it should then be possible to view/analyze the data in multiple ways. I'm not sure at what point we might get around to do that, however. |
fa499cd
to
8ed33e5
Compare
8ed33e5
to
ec37b10
Compare
35fe5bc
to
79880f8
Compare
79880f8
to
32e220f
Compare
A version of this exists in Picos. |
This PR implements a MPMC queue using two stacks. This is a new lock-free queue algorithm / data structure. It uses two stacks for the tail and head of the queue. Operations on the tail (pushes) and head (pops) are like with a Treiber stack. A simple lock-free algorithm is used to transfer elements from the tail to head after a pop is attempted on an empty head (and the tail is non-empty).
The interesting feature of this queue is that it seems to outperform an optimized Michael-Scott queue (see #122) on many machines. Here are results from a benchmark run on my M3 Max:
As one can see, the (median) thruput is substantially higher than that of the optimized Michael-Scott queue (see #122) and that seems to be the case on most machines I've tested this on. Most interestingly, however, on the "fermat" machine we use for benchmarking, the Michael-Scott queue seems to perform better. See my comment below for a possible explanation.