-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support comparison semantics for batch serialize/deserialize of Column #9756
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
a311faa
to
eeb1ac3
Compare
Signed-off-by: guo-shaoge <[email protected]>
eeb1ac3
to
c07d13a
Compare
e19c851
to
7c725cb
Compare
Signed-off-by: guo-shaoge <[email protected]>
7c725cb
to
84ee65b
Compare
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
/retest |
Signed-off-by: guo-shaoge <[email protected]>
3878b4b
to
47cdf91
Compare
[LGTM Timeline notifier]Timeline:
|
38e03ed
to
8070321
Compare
06ff1d4
to
7572680
Compare
} | ||
else | ||
{ | ||
inline_memcpy(pos[i], &data[array_offsets[start + i - 1]], len * sizeof(T)); | ||
if (len <= 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just a simple optimization. If the length is very small, copying them one by one is faster than std::memcpy.
dbms/src/Columns/ColumnString.cpp
Outdated
{ | ||
assert(sizeAt(i) >= 1); | ||
// Minus 1 because of terminating zero. | ||
byte_size[i] += sizeof(UInt32) + (sizeAt(i) - 1) * max_bytes_one_char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
size * max_bytes_one_char
may waste lots of memory. For example, a utf8 character is 3 bytes, max_bytes_one_char
is 4 bytes. 4 bytes is enough but here need 12 bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
dbms/src/Columns/ColumnString.cpp
Outdated
const void * src = &chars[offsetAt(start + i)]; | ||
if constexpr (has_collator) | ||
{ | ||
auto sort_key = collator->sortKey(reinterpret_cast<const char *>(src), str_size - 1, *sort_key_container); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sortKey
is a virtual function. How about adding a batch version to reduce the overhead of virtual functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some marcos so we can static_cast collator to Derived class and call its method
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
7572680
to
25406de
Compare
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
Signed-off-by: guo-shaoge <[email protected]>
{ | ||
size_t prev_size = offsets.size(); | ||
size_t char_size = chars.size(); | ||
size_t size = pos.size(); | ||
|
||
#ifdef TIFLASH_ENABLE_AVX_SUPPORT | ||
if (use_nt_align_buffer) | ||
if constexpr (!add_terminating_zero) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a terminating zero in serializeToPos
? If so, this branch parameter can be removed and align_buffer can be used as original version.
size_t maxBytesForOneChar() const override | ||
{ | ||
// Every char have 8 uint16 at most. | ||
return 8 * sizeof(uint16_t); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems too large. Is it necessary to use 16 bytes?
this new interface is not needed anymore. |
What problem does this PR solve?
Issue Number: close #9761
Problem Summary: #9553 add serialize/deserialize interface by column-wise, but it didn't handle collator for ColumnString and real copy format for ColumnDecimal([sign, limb_count, limb_data])
What is changed and how it works?
Check List
Tests
Side effects
Documentation
Release note