-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement bool compression (#7233) #7701
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7701 +/- ##
==========================================
+ Coverage 80.06% 81.93% +1.86%
==========================================
Files 190 247 +57
Lines 37181 45447 +8266
Branches 9450 11355 +1905
==========================================
+ Hits 29770 37235 +7465
- Misses 2997 3738 +741
- Partials 4414 4474 +60 ☔ View full report in Codecov by Sentry. |
{ | ||
BoolDecompressionIterator *iterator = palloc(sizeof(*iterator)); | ||
decompression_iterator_init(iterator, | ||
(void *) PG_DETOAST_DATUM(bool_compressed), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make detoasting the caller's responsibility? We have some faster functions for that in detoaster.c
. You can change it to Assert(VARATT_IS_4B_U(bool_compresed))
-- probably, I forgot the TOAST types again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I make this the caller's responsibility, then it will be inconsistent with the other compression algorithms. Do you want to change them too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I raised a separate issue to address this across all compression algorithms: #7716
f1bc55c
to
506a65c
Compare
{ | ||
CompressedDataHeaderFields; | ||
uint8 has_nulls; /* 1 if this has a NULLs bitmap after the values, 0 otherwise */ | ||
uint8 padding[2]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other compression algorithms had this alignment to 8 bytes. Do we not need it?
{ | ||
compressor->has_nulls = true; | ||
simple8brle_compressor_append(&compressor->values, compressor->last_value); | ||
simple8brle_compressor_append(&compressor->nulls, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're adding 1 for a null value, so this means you'll have to negate it after decompression to match the Arrow format? Let's switch to validity bitmap instead, so that we decompress straight to Arrow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how it is in the other compression algorithms, but happy to change this.
.bool_compressor_append = tsl_bool_compressor_append, | ||
.bool_compressor_finish = tsl_bool_compressor_finish, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need them in cross-module functions, they shouldn't be called from the Apache part, right? For testing, you can add the TSL functions directly like it's done for ts_read_compressed_data_file
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is again for consistency with the other compression algos.
e9372f0
to
d566381
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think we have discussed all the important points here and on Slack, so let's proceed with it.
ebc550e
to
c756e51
Compare
8333066
to
032cc4a
Compare
032cc4a
to
76d7bb8
Compare
Reusing the existing Simple8bRLE algorithm for bools. I added a new compression type specifically for this case called 'bool'. A new GUC is introduced so we can revert to the previous, array compression for bools: `timescaledb.enable_bool_compression`. It defaults to `false`. To enable bool compression set the GUC: `timescaledb.enable_bool_compression=true` Fixes timescale#7233
76d7bb8
to
d9e4d1b
Compare
Reusing the existing Simple8bRLE algorithm for bools. I added a new
compression type specifically for this case called 'bool'.
A new GUC is introduced so we can revert to the previous, array
compression for bools:
timescaledb.enable_bool_compression
.It defaults to
false
.To enable bool compression set the GUC:
timescaledb.enable_bool_compression=true
Fixes #7233