Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++] Simplify __bitset::__init #121357

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

winner245
Copy link
Contributor

@winner245 winner245 commented Dec 30, 2024

This PR simplifies __bitset::__init into a more compact and readable form, which avoids redundant computations of a size_t value and eliminates the overhead of a local array.

@winner245 winner245 force-pushed the simplify_bitset_init branch 5 times, most recently from 4446bf6 to d6bfafe Compare January 6, 2025 13:40
@winner245 winner245 marked this pull request as ready for review January 8, 2025 16:30
@winner245 winner245 requested a review from a team as a code owner January 8, 2025 16:30
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jan 8, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 8, 2025

@llvm/pr-subscribers-libcxx

Author: Peng Liu (winner245)

Changes

This PR simplifies __bitset::__init into a more compact and readable form, which avoids redundant computations of a size_t value and eliminates the overhead of a local array.


Full diff: https://github.com/llvm/llvm-project/pull/121357.diff

1 Files Affected:

  • (modified) libcxx/include/bitset (+7-16)
diff --git a/libcxx/include/bitset b/libcxx/include/bitset
index 919d2a0f07e096..57c29f5f85d27c 100644
--- a/libcxx/include/bitset
+++ b/libcxx/include/bitset
@@ -253,26 +253,16 @@ inline _LIBCPP_CONSTEXPR __bitset<_N_words, _Size>::__bitset() _NOEXCEPT
 
 template <size_t _N_words, size_t _Size>
 void __bitset<_N_words, _Size>::__init(unsigned long long __v, false_type) _NOEXCEPT {
-  __storage_type __t[sizeof(unsigned long long) / sizeof(__storage_type)];
-  size_t __sz = _Size;
-  for (size_t __i = 0; __i < sizeof(__t) / sizeof(__t[0]); ++__i, __v >>= __bits_per_word, __sz -= __bits_per_word)
-    if (__sz < __bits_per_word)
-      __t[__i] = static_cast<__storage_type>(__v) & (1ULL << __sz) - 1;
-    else
-      __t[__i] = static_cast<__storage_type>(__v);
-
-  std::copy(__t, __t + sizeof(__t) / sizeof(__t[0]), __first_);
-  std::fill(
-      __first_ + sizeof(__t) / sizeof(__t[0]), __first_ + sizeof(__first_) / sizeof(__first_[0]), __storage_type(0));
+  const size_t __ull_words = sizeof(unsigned long long) / sizeof(__storage_type);
+  for (size_t __i = 0; __i < __ull_words; ++__i, __v >>= __bits_per_word)
+    __first_[__i] = static_cast<__storage_type>(__v);
+  std::fill(__first_ + __ull_words, __first_ + _N_words, __storage_type(0));
 }
 
 template <size_t _N_words, size_t _Size>
 inline _LIBCPP_HIDE_FROM_ABI void __bitset<_N_words, _Size>::__init(unsigned long long __v, true_type) _NOEXCEPT {
   __first_[0] = __v;
-  if (_Size < __bits_per_word)
-    __first_[0] &= (1ULL << _Size) - 1;
-
-  std::fill(__first_ + 1, __first_ + sizeof(__first_) / sizeof(__first_[0]), __storage_type(0));
+  std::fill(__first_ + 1, __first_ + _N_words, __storage_type(0));
 }
 
 #  endif // _LIBCPP_CXX03_LANG
@@ -623,7 +613,8 @@ public:
 
   // 23.3.5.1 constructors:
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bitset() _NOEXCEPT {}
-  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bitset(unsigned long long __v) _NOEXCEPT : __base(__v) {}
+  _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR bitset(unsigned long long __v) _NOEXCEPT
+      : __base(sizeof(unsigned long long) * CHAR_BIT <= _Size ? __v : __v & ((1ULL << _Size) - 1)) {}
   template <class _CharT, __enable_if_t<_IsCharLikeType<_CharT>::value, int> = 0>
   _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX23 explicit bitset(
       const _CharT* __str,

}

template <size_t _N_words, size_t _Size>
inline _LIBCPP_HIDE_FROM_ABI void __bitset<_N_words, _Size>::__init(unsigned long long __v, true_type) _NOEXCEPT {
__first_[0] = __v;
if (_Size < __bits_per_word)
__first_[0] &= (1ULL << _Size) - 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the kind of thing that was added as an optimization. Could you make sure that we don't regress the performance with this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! I've run some benchmarks and based on the results, there doesn't seem to be any performance difference.

--------------------------------------------
Benchmark             Before           After
--------------------------------------------
BM_ctor_ull/1       0.228 ns        0.230 ns
BM_ctor_ull/2       0.233 ns        0.228 ns
BM_ctor_ull/3       0.231 ns        0.232 ns
BM_ctor_ull/4       0.229 ns        0.229 ns
BM_ctor_ull/5       0.231 ns        0.230 ns
BM_ctor_ull/6       0.232 ns        0.233 ns
BM_ctor_ull/7       0.230 ns        0.228 ns
BM_ctor_ull/8       0.233 ns        0.235 ns
BM_ctor_ull/9       0.232 ns        0.233 ns
BM_ctor_ull/10      0.232 ns        0.233 ns
BM_ctor_ull/11      0.235 ns        0.233 ns
BM_ctor_ull/12      0.230 ns        0.228 ns
BM_ctor_ull/13      0.231 ns        0.236 ns
BM_ctor_ull/14      0.231 ns        0.234 ns
BM_ctor_ull/15      0.232 ns        0.232 ns  
BM_ctor_ull/16      0.233 ns        0.236 ns   
BM_ctor_ull/17      0.234 ns        0.237 ns   
BM_ctor_ull/18      0.234 ns        0.236 ns   
BM_ctor_ull/19      0.235 ns        0.236 ns   
BM_ctor_ull/20      0.232 ns        0.232 ns   
BM_ctor_ull/21      0.236 ns        0.235 ns   
BM_ctor_ull/22      0.233 ns        0.230 ns   
BM_ctor_ull/23      0.234 ns        0.237 ns   
BM_ctor_ull/24      0.236 ns        0.236 ns   
BM_ctor_ull/25      0.234 ns        0.234 ns   
BM_ctor_ull/26      0.237 ns        0.235 ns   
BM_ctor_ull/27      0.233 ns        0.236 ns   
BM_ctor_ull/28      0.234 ns        0.238 ns   
BM_ctor_ull/29      0.235 ns        0.239 ns   
BM_ctor_ull/30      0.234 ns        0.240 ns   
BM_ctor_ull/31      0.235 ns        0.234 ns   
BM_ctor_ull/32      0.234 ns        0.231 ns   
BM_ctor_ull/33      0.235 ns        0.237 ns   
BM_ctor_ull/34      0.235 ns        0.234 ns   
BM_ctor_ull/35      0.235 ns        0.237 ns   
BM_ctor_ull/36      0.236 ns        0.234 ns   
BM_ctor_ull/37      0.233 ns        0.236 ns   
BM_ctor_ull/38      0.234 ns        0.235 ns   
BM_ctor_ull/39      0.233 ns        0.236 ns   
BM_ctor_ull/40      0.233 ns        0.234 ns 
BM_ctor_ull/41      0.234 ns        0.231 ns  
BM_ctor_ull/42      0.236 ns        0.233 ns  
BM_ctor_ull/43      0.233 ns        0.232 ns  
BM_ctor_ull/44      0.233 ns        0.240 ns  
BM_ctor_ull/45      0.230 ns        0.234 ns  
BM_ctor_ull/46      0.231 ns        0.235 ns 
BM_ctor_ull/47      0.233 ns        0.233 ns 
BM_ctor_ull/48      0.235 ns        0.231 ns 
BM_ctor_ull/49      0.232 ns        0.235 ns 
BM_ctor_ull/50      0.230 ns        0.235 ns 
BM_ctor_ull/51      0.234 ns        0.234 ns 
BM_ctor_ull/52      0.244 ns        0.235 ns  
BM_ctor_ull/53      0.242 ns        0.234 ns 
BM_ctor_ull/54      0.234 ns        0.236 ns
BM_ctor_ull/55      0.235 ns        0.236 ns
BM_ctor_ull/56      0.238 ns        0.234 ns
BM_ctor_ull/57      0.242 ns        0.234 ns
BM_ctor_ull/58      0.237 ns        0.236 ns
BM_ctor_ull/59      0.239 ns        0.230 ns
BM_ctor_ull/60      0.236 ns        0.236 ns
BM_ctor_ull/61      0.235 ns        0.238 ns
BM_ctor_ull/62      0.239 ns        0.232 ns
BM_ctor_ull/63      0.237 ns        0.238 ns

where BM_ctor_ull/i refers to a test case for the unsigned long long constructor with an argument (1ULL << i) - 1 (which has exactly i 1s). My understanding is that there is not much we can optimize here since the input is only a scalar value. Therefore, my main objective of this PR is to simplify the code.

@winner245 winner245 force-pushed the simplify_bitset_init branch from d6bfafe to 0f02dfb Compare February 24, 2025 23:41
Copy link
Contributor

@philnik777 philnik777 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of trying to simplify the C++03 code, we should look into removing __init entirely and replace the code with something like

bitset(unsigned long long __v) : vals() {
        vals[0] = static_cast<size_t>(__v);
  if (sizeof(size_t) == 4) {
    __first_[1] = _Size >= 2 * __bits_per_word
                   ? static_cast<size_t>(__v >> __bits_per_word)
                   : static_cast<size_t>((__v >> __bits_per_word) &
                          (__storage_type(1) << (_Size - __bits_per_word)) - 1);
  }
}

@winner245 winner245 force-pushed the simplify_bitset_init branch 2 times, most recently from a1b6ecb to e95d208 Compare February 27, 2025 15:26
@winner245 winner245 force-pushed the simplify_bitset_init branch from e95d208 to cf0d4d7 Compare February 27, 2025 16:16
@winner245
Copy link
Contributor Author

winner245 commented Feb 27, 2025

After several attempts with a stage1 (generic-gcc, gcc-14, g++-14) CI failure, I realized that we still have to keep the initialization of the __first_ array in the constructor initializer list, because the constructor is constexpr since C++11, whereas C++11 constexpr constructor body does not allow statements other than the following: null statements, static_assert, typedef or alias declarations, using declarations/directives (https://en.cppreference.com/w/cpp/language/constexpr). Thus, gcc is unable to compile it (A reproducer) and that's why the generic-gcc CI fails.

Given this limitation, it seems that we still have to move the initialization back to the constructor initializer list. Since C++03 does not support {} in the constructor initializer list, I think we still need to keep the __init function for C++03. Please let me know if you have different opinions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants