-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc++] Simplify __bitset::__init #121357
base: main
Are you sure you want to change the base?
Conversation
4446bf6
to
d6bfafe
Compare
@llvm/pr-subscribers-libcxx Author: Peng Liu (winner245) ChangesThis PR simplifies Full diff: https://github.com/llvm/llvm-project/pull/121357.diff 1 Files Affected:
diff --git a/libcxx/include/bitset b/libcxx/include/bitset
index 919d2a0f07e096..57c29f5f85d27c 100644
--- a/libcxx/include/bitset
+++ b/libcxx/include/bitset
@@ -253,26 +253,16 @@ inline _LIBCPP_CONSTEXPR __bitset<_N_words, _Size>::__bitset() _NOEXCEPT
template <size_t _N_words, size_t _Size>
void __bitset<_N_words, _Size>::__init(unsigned long long __v, false_type) _NOEXCEPT {
- __storage_type __t[sizeof(unsigned long long) / sizeof(__storage_type)];
- size_t __sz = _Size;
- for (size_t __i = 0; __i < sizeof(__t) / sizeof(__t[0]); ++__i, __v >>= __bits_per_word, __sz -= __bits_per_word)
- if (__sz < __bits_per_word)
- __t[__i] = static_cast<__storage_type>(__v) & (1ULL << __sz) - 1;
- else
- __t[__i] = static_cast<__storage_type>(__v);
-
- std::copy(__t, __t + sizeof(__t) / sizeof(__t[0]), __first_);
- std::fill(
- __first_ + sizeof(__t) / sizeof(__t[0]), __first_ + sizeof(__first_) / sizeof(__first_[0]), __storage_type(0));
+ const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type);
+ for (size_t __i = 0; __i < __ull_words; ++__i, __v >>= __bits_per_word)
+ __first_[__i] = static_cast<__storage_type>(__v);
+ std::fill(__first_ + __ull_words, __first_ + _N_words, __storage_type(0));
}
template <size_t _N_words, size_t _Size>
inline _LIBCPP_HIDE_FROM_ABI void __bitset<_N_words, _Size>::__init(unsigned long long __v, true_type) _NOEXCEPT {
__first_[0] = __v;
- if (_Size < __bits_per_word)
- __first_[0] &= (1ULL << _Size) - 1;
-
- std::fill(__first_ + 1, __first_ + sizeof(__first_) / sizeof(__first_[0]), __storage_type(0));
+ std::fill(__first_ + 1, __first_ + _N_words, __storage_type(0));
}
# endif // _LIBCPP_CXX03_LANG
@@ -623,7 +613,8 @@ public:
// 23.3.5.1 constructors:
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bitset() _NOEXCEPT {}
- _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bitset(unsigned long long __v) _NOEXCEPT : __base(__v) {}
+ _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bitset(unsigned long long __v) _NOEXCEPT
+ : __base(sizeof(unsigned long long) * CHAR_BIT <= _Size ? __v : __v & ((1ULL << _Size) - 1)) {}
template <class _CharT, __enable_if_t<_IsCharLikeType<_CharT>::value, int> = 0>
_LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 explicit bitset(
const _CharT* __str,
|
} | ||
|
||
template <size_t _N_words, size_t _Size> | ||
inline _LIBCPP_HIDE_FROM_ABI void __bitset<_N_words, _Size>::__init(unsigned long long __v, true_type) _NOEXCEPT { | ||
__first_[0] = __v; | ||
if (_Size < __bits_per_word) | ||
__first_[0] &= (1ULL << _Size) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like the kind of thing that was added as an optimization. Could you make sure that we don't regress the performance with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! I've run some benchmarks and based on the results, there doesn't seem to be any performance difference.
--------------------------------------------
Benchmark Before After
--------------------------------------------
BM_ctor_ull/1 0.228 ns 0.230 ns
BM_ctor_ull/2 0.233 ns 0.228 ns
BM_ctor_ull/3 0.231 ns 0.232 ns
BM_ctor_ull/4 0.229 ns 0.229 ns
BM_ctor_ull/5 0.231 ns 0.230 ns
BM_ctor_ull/6 0.232 ns 0.233 ns
BM_ctor_ull/7 0.230 ns 0.228 ns
BM_ctor_ull/8 0.233 ns 0.235 ns
BM_ctor_ull/9 0.232 ns 0.233 ns
BM_ctor_ull/10 0.232 ns 0.233 ns
BM_ctor_ull/11 0.235 ns 0.233 ns
BM_ctor_ull/12 0.230 ns 0.228 ns
BM_ctor_ull/13 0.231 ns 0.236 ns
BM_ctor_ull/14 0.231 ns 0.234 ns
BM_ctor_ull/15 0.232 ns 0.232 ns
BM_ctor_ull/16 0.233 ns 0.236 ns
BM_ctor_ull/17 0.234 ns 0.237 ns
BM_ctor_ull/18 0.234 ns 0.236 ns
BM_ctor_ull/19 0.235 ns 0.236 ns
BM_ctor_ull/20 0.232 ns 0.232 ns
BM_ctor_ull/21 0.236 ns 0.235 ns
BM_ctor_ull/22 0.233 ns 0.230 ns
BM_ctor_ull/23 0.234 ns 0.237 ns
BM_ctor_ull/24 0.236 ns 0.236 ns
BM_ctor_ull/25 0.234 ns 0.234 ns
BM_ctor_ull/26 0.237 ns 0.235 ns
BM_ctor_ull/27 0.233 ns 0.236 ns
BM_ctor_ull/28 0.234 ns 0.238 ns
BM_ctor_ull/29 0.235 ns 0.239 ns
BM_ctor_ull/30 0.234 ns 0.240 ns
BM_ctor_ull/31 0.235 ns 0.234 ns
BM_ctor_ull/32 0.234 ns 0.231 ns
BM_ctor_ull/33 0.235 ns 0.237 ns
BM_ctor_ull/34 0.235 ns 0.234 ns
BM_ctor_ull/35 0.235 ns 0.237 ns
BM_ctor_ull/36 0.236 ns 0.234 ns
BM_ctor_ull/37 0.233 ns 0.236 ns
BM_ctor_ull/38 0.234 ns 0.235 ns
BM_ctor_ull/39 0.233 ns 0.236 ns
BM_ctor_ull/40 0.233 ns 0.234 ns
BM_ctor_ull/41 0.234 ns 0.231 ns
BM_ctor_ull/42 0.236 ns 0.233 ns
BM_ctor_ull/43 0.233 ns 0.232 ns
BM_ctor_ull/44 0.233 ns 0.240 ns
BM_ctor_ull/45 0.230 ns 0.234 ns
BM_ctor_ull/46 0.231 ns 0.235 ns
BM_ctor_ull/47 0.233 ns 0.233 ns
BM_ctor_ull/48 0.235 ns 0.231 ns
BM_ctor_ull/49 0.232 ns 0.235 ns
BM_ctor_ull/50 0.230 ns 0.235 ns
BM_ctor_ull/51 0.234 ns 0.234 ns
BM_ctor_ull/52 0.244 ns 0.235 ns
BM_ctor_ull/53 0.242 ns 0.234 ns
BM_ctor_ull/54 0.234 ns 0.236 ns
BM_ctor_ull/55 0.235 ns 0.236 ns
BM_ctor_ull/56 0.238 ns 0.234 ns
BM_ctor_ull/57 0.242 ns 0.234 ns
BM_ctor_ull/58 0.237 ns 0.236 ns
BM_ctor_ull/59 0.239 ns 0.230 ns
BM_ctor_ull/60 0.236 ns 0.236 ns
BM_ctor_ull/61 0.235 ns 0.238 ns
BM_ctor_ull/62 0.239 ns 0.232 ns
BM_ctor_ull/63 0.237 ns 0.238 ns
where BM_ctor_ull/i
refers to a test case for the unsigned long long
constructor with an argument (1ULL << i) - 1
(which has exactly i
1s). My understanding is that there is not much we can optimize here since the input is only a scalar value. Therefore, my main objective of this PR is to simplify the code.
d6bfafe
to
0f02dfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think instead of trying to simplify the C++03 code, we should look into removing __init
entirely and replace the code with something like
bitset(unsigned long long __v) : vals() {
vals[0] = static_cast<size_t>(__v);
if (sizeof(size_t) == 4) {
__first_[1] = _Size >= 2 * __bits_per_word
? static_cast<size_t>(__v >> __bits_per_word)
: static_cast<size_t>((__v >> __bits_per_word) &
(__storage_type(1) << (_Size - __bits_per_word)) - 1);
}
}
a1b6ecb
to
e95d208
Compare
e95d208
to
cf0d4d7
Compare
After several attempts with a stage1 (generic-gcc, gcc-14, g++-14) CI failure, I realized that we still have to keep the initialization of the Given this limitation, it seems that we still have to move the initialization back to the constructor initializer list. Since C++03 does not support |
This PR simplifies
__bitset::__init
into a more compact and readable form, which avoids redundant computations of asize_t
value and eliminates the overhead of a local array.