Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] Cumulative micro opts (1-2% perf. gain) #1092

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

ryoqun
Copy link
Contributor

@ryoqun ryoqun commented Mar 8, 2024

Well, I REALLY want my other prs to be merged (or decided to be rejected, so that i can decide to fork crossbeam for my work....). All of them got stale for long time for some reason.

So, I'm wondering whether it could help by actively contributing to crossbeam or not, in order to indicate my willingness to spend my time in exchange of my prs got being reviewed. :)

here it is. Assorted of micro ops, i came up with. note that I haven't evaluated each in depth. This pr is to start triage them from more knowledgeable people. so that I can create more serious prs for each.

General theme of them is to reduce instructions.

(note to self):

  • inject to-be-dropped buffer from receiver to sender
  • new with at-the-end of index so that initial alloc can happen even with non-option buffer

@taiki-e
Copy link
Member

taiki-e commented May 2, 2024

So, I'm wondering whether it could help by actively contributing to crossbeam or not, in order to indicate my willingness to spend my time in exchange of my prs got being reviewed. :)

I appreciate the contribution, but a patch that mixes various different-purpose changes without any particular explanation of each change is also the kind of patch that takes time to review...


As for the changes:

  • Subword (8-bit and 16-bit) atomic can be very inefficient on platforms like riscv, and benchmarks that actually use it in the unbounded queue have shown that it doesn't perform that well even on x86_64 (Add benchmarks for unbounded queues smol-rs/concurrent-queue#13). I think atomic of the same size as c_int would probably be ok, but it needs a separate benchmark anyway.
  • Separation of slots to states and msgs may increase the chance of encountering false sharing (Improvement on cache invalidation #462). I think it is possible that it could be handled well in the way described in https://arxiv.org/pdf/1908.04511, but it needs a separate benchmark anyway.
  • At this point I realize that there is no information on how the 1-2% gain was actually measured. For example, if the result of all this change is as little as a 1% improvement in a single microbenchmark, the worth of the improvement relative to the complexity of the change is questionable.
  • assume_init_drop -> ptr::read + drop is expected to degrade performance when messages are relatively large due to the extra moves.
  • Most of the other changes seem to be code shuffling, but it was not clear from what little I saw what exactly you were trying to improve.

If this PR's purpose is to help merge your other PRs, I honestly think it would be better to add tests to complete them. It is not easy to review PRs that add new APIs but have no tests (and there is no possibility of such PRs being merged as is).

@ryoqun
Copy link
Contributor Author

ryoqun commented May 4, 2024

(thanks for various feedback; I'll appreciate very much; I'll reply to these later. focusing on #1040 for now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants