-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add niche optimization #49
base: master
Are you sure you want to change the base?
Conversation
This PR adds niche optimization. This lets the compiler use values outside of the valid integer range to store e.g. other enum variants. This makes `Option<u7>` one byte. The internal representation is own the same as the related primitive type, i.e. every u30 can be transmuted to u32. The other direction is possible if the u32 is representable in the u30 range.
Nice; with the struct trick it should not make build times too bad. Thanks for implementing this! I'll definitely give it a try and report back here. (I might also look into whether types that previously made little sense (eg. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this PR. I think this is great!
Do you know if we have managed to not expose any of the underlying implementation details such that this can be considered backwards compatible? Or have we given any guarantees that this will break? If you're not sure I will try to dig into it one of the next days.
Would it make sense to add some sort of test to verify that the niche optimization is being made when possible? Maybe there are other optimization we would like to implement down the road, then such test would be helpful to make sure we don't break anything. Let me know what you think.
I will try to check out the actual build time one of the next days and I also hope @chrysn gets a chance to comment after testing it.
This speeds up the compilation by 50% (YMMV), because code is only generated if used. The methods should generate trivial instructions for most methods, so there should be no drawbacks in using this annotation.
I code it fully self-contained. The API was not changed in any way, nothing added, nothing removed. But any code that relied on the previous, undocumented behavior, that you could set the unused bits to any value, will break. This could only be done on
static_assertions could be used to ensure that the optimizations work. I'll have a look what the best way is to add it.
I added a commit to this PR, that makes every method |
So I just finished testing how this pull affects compile times and here are the results. The test was I ran
My testing shows this pull will increase compile time by about 10%. That is the raw data. |
I've added a test that checks the layout (size + alignment) of uX is the same as its primitive type, and that the niche optimization for |
I optimized the macro invocation, and used the same test as @ironhaven (best of three runs)
cargo 1.63.0 (fd9c4297c 2022-07-01), AMD Ryzen 7 2700X (8 cores @ 3.7GHz) The macro optimization could easily be used without adding niche optimization, so it would be a lie for me to claim this PR comes without a cost. |
@chrysn do you think you will have a chance to test it out before merge? Or should I try to move this on without? |
I've attempted a larger trial run, but that got stuck in too many places where I'd need to introduce additional I've done a few tests more on the micro side, finding that
That doesn't make me particularly happy, but it's still an improvement, and most importantly, to my understanding, further improvements would not need to be done here, but would be up to the Rust compiler. So I have no clear recommendation -- it's definitely not a "no", but it's not the enthusiastic "yes" I hoped to provide either. Given you were about to merge even without my tests, I'd guess that's a "go ahead". |
@chrysn, that should be fixed by rust-lang/rust#94075. I don't know when this change lands in |
I should have tested on nightly right away -- a Result<u31, multi-valued-enum> is apparently optimized starting with current beta, 1.65-to-be. What it still can't do is optimizing for enums with values (eg. So ... well, one step closer even. Thanks, I like it :-) |
I was trying to see what was still needed for this PR. Seems like it's just build times that are a concern? Here's the latest on my M1 MacBook Pro, if this is of any help (hello world cargo binary using uX as a dep, with just enough code so it's not elided,
It's a 3% gain, but with some overlap in the data. |
This PR adds niche optimization. This lets the compiler use values outside of the valid integer range to store e.g. other enum variants. This makes
Option<u7>
one byte.The internal representation is own the same as the related primitive type, i.e. every u30 can be transmuted to u32. The other direction is possible if the u32 is representable in the u30 range.