Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by skipping char copies #1398

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Slackadays
Copy link

All of the lambdas that check if a character is a certain value get their char by value. This creates a lot of unnecessary copies, and if you're doing these in a hot loop, this creates a huge slowdown. I had just that happen in my RISC-V assembler (Slackadays/Chata@727838c) and switching char c (actually int c for the standard library functions) to const char& c to get the value by const reference yielded an instant 15% speedup for free. Best of all, this change does not impact the lambas' usage semantics.

@rui314
Copy link
Owner

rui314 commented Jan 2, 2025

All these lambdas can be inlined, and I believe compilers are generally smart enough to inline them, or at least can figure out that the char values passed to these lambdas are not mutated inside the function and apply optimization based on that analysis.

Also copying a char by value is faster than passing a char as a reference, no? char is 1 byte, so copying it is as cheap as or cheaper than passing a pointer to a char.

@Slackadays
Copy link
Author

All these lambdas can be inlined, and I believe compilers are generally smart enough to inline them, or at least can figure out that the char values passed to these lambdas are not mutated inside the function and apply optimization based on that analysis.

You'd think so and I did too, but this wasn't the case as of GCC 14 which is where I got the 15% performance boost.

Also copying a char by value is faster than passing a char as a reference, no? char is 1 byte, so copying it is as cheap as or cheaper than passing a pointer to a char.

Actually not, since if you pass it by reference, you can reuse existing registers/memory to make comparisons to the character, but if you pass it by value, the compiler has to allocate a new register just for the comparison.

@rui314
Copy link
Owner

rui314 commented Jan 2, 2025

I applied this patch, and it looks like only the change to filetype.cc makes a difference in the compiled code. And that change doesn't seem that different anyway. Even if it does, that function is not in a performance-critical pass, so we don't need to optimize that function.

@Slackadays
Copy link
Author

Then that's weird how most of them made no difference at all, but maybe it has something to do with the standard library versus lamba optimization.

@rui314
Copy link
Owner

rui314 commented Jan 2, 2025

You may want to try that yourself. Anyway, we optimize only the code that matters and leave the rest alone. I appreciate your interest in our code. However, I'd accept PRs like this only when they make a measurable performance change with a real-world benchmark using the mold's --perf option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants