Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bool compression (#7233) #7701

Merged
merged 1 commit into from
Feb 17, 2025

Conversation

dbeck
Copy link
Contributor

@dbeck dbeck commented Feb 12, 2025

Reusing the existing Simple8bRLE algorithm for bools. I added a new
compression type specifically for this case called 'bool'.

A new GUC is introduced so we can revert to the previous, array
compression for bools: timescaledb.enable_bool_compression.
It defaults to false.

To enable bool compression set the GUC:

timescaledb.enable_bool_compression=true

Fixes #7233

Copy link

codecov bot commented Feb 12, 2025

Codecov Report

Attention: Patch coverage is 81.87311% with 60 lines in your changes missing coverage. Please review.

Project coverage is 81.93%. Comparing base (59f50f2) to head (d9e4d1b).
Report is 769 commits behind head on main.

Files with missing lines Patch % Lines
tsl/test/src/compression_unit_test.c 77.20% 0 Missing and 31 partials ⚠️
tsl/src/compression/algorithms/bool_compress.c 86.66% 4 Missing and 20 partials ⚠️
tsl/src/compression/compression.c 70.00% 3 Missing ⚠️
tsl/test/src/compression_sql_test.c 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7701      +/-   ##
==========================================
+ Coverage   80.06%   81.93%   +1.86%     
==========================================
  Files         190      247      +57     
  Lines       37181    45447    +8266     
  Branches     9450    11355    +1905     
==========================================
+ Hits        29770    37235    +7465     
- Misses       2997     3738     +741     
- Partials     4414     4474      +60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

{
BoolDecompressionIterator *iterator = palloc(sizeof(*iterator));
decompression_iterator_init(iterator,
(void *) PG_DETOAST_DATUM(bool_compressed),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make detoasting the caller's responsibility? We have some faster functions for that in detoaster.c. You can change it to Assert(VARATT_IS_4B_U(bool_compresed)) -- probably, I forgot the TOAST types again...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I make this the caller's responsibility, then it will be inconsistent with the other compression algorithms. Do you want to change them too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I raised a separate issue to address this across all compression algorithms: #7716

{
CompressedDataHeaderFields;
uint8 has_nulls; /* 1 if this has a NULLs bitmap after the values, 0 otherwise */
uint8 padding[2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other compression algorithms had this alignment to 8 bytes. Do we not need it?

{
compressor->has_nulls = true;
simple8brle_compressor_append(&compressor->values, compressor->last_value);
simple8brle_compressor_append(&compressor->nulls, 1);
Copy link
Member

@akuzm akuzm Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're adding 1 for a null value, so this means you'll have to negate it after decompression to match the Arrow format? Let's switch to validity bitmap instead, so that we decompress straight to Arrow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it is in the other compression algorithms, but happy to change this.

Comment on lines +171 to +172
.bool_compressor_append = tsl_bool_compressor_append,
.bool_compressor_finish = tsl_bool_compressor_finish,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need them in cross-module functions, they shouldn't be called from the Apache part, right? For testing, you can add the TSL functions directly like it's done for ts_read_compressed_data_file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is again for consistency with the other compression algos.

@dbeck dbeck force-pushed the bool-compress-reindent branch 2 times, most recently from e9372f0 to d566381 Compare February 17, 2025 13:51
Copy link
Member

@akuzm akuzm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think we have discussed all the important points here and on Slack, so let's proceed with it.

@dbeck dbeck force-pushed the bool-compress-reindent branch 3 times, most recently from ebc550e to c756e51 Compare February 17, 2025 15:10
@dbeck dbeck force-pushed the bool-compress-reindent branch 2 times, most recently from 8333066 to 032cc4a Compare February 17, 2025 15:59
@dbeck dbeck force-pushed the bool-compress-reindent branch from 032cc4a to 76d7bb8 Compare February 17, 2025 16:21
@dbeck dbeck enabled auto-merge (rebase) February 17, 2025 17:40
Reusing the existing Simple8bRLE algorithm for bools. I added a new
compression type specifically for this case called 'bool'.

A new GUC is introduced so we can revert to the previous, array
compression for bools: `timescaledb.enable_bool_compression`.
It defaults to `false`.

To enable bool compression set the GUC:

`timescaledb.enable_bool_compression=true`

Fixes timescale#7233
@dbeck dbeck force-pushed the bool-compress-reindent branch from 76d7bb8 to d9e4d1b Compare February 17, 2025 18:35
@dbeck dbeck merged commit d7a8b4b into timescale:main Feb 17, 2025
67 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement bool compression
3 participants