Snappy compression for serialization_sorter.h ??? #186

hendrikmuhs · 2015-05-21T14:03:47Z

Hi,

I am using serialization_sorter.h to sort huge amounts of key-value data (strings, variable length).

Is it possible and do you think it makes sense to implement snappy compression for it? What would be the best place?

I would think here:
https://github.com/thomasmoelhave/tpie/blob/master/tpie/serialization_stream.h

I also considered compressing at least the values myself in serialize and unserialize but as my values are something like 50-400 characters it will not be very effective to compress these short strings separately.

I think block-wise compression would make more sense.

(I would implement it myself and send you a PR)

antialize · 2015-05-21T15:51:52Z

I would definitly make sence to compress the blocks, instead of compressing the individual text strings. If @Mortal has time perhaps he can tell us what the best approach will be. If you want to implement this that is good, we can probably allocate some time for @svendcsvendsen to help you.

svendcs · 2015-05-21T15:57:05Z

Using Snappy for compression in the serialization_sorter definitely makes a lot of sense for situations like this. @Mortal implemented the serialization code and knows most about it, however i'll definitely be available if you need some help in regards to the implementation.

Mortal · 2015-05-22T08:12:49Z

Actually, block-wise compression makes more sense for serialization streams than ordinary streams, since serialization streams do not support seek.

The four stream classes serialization{_reverse,}{_reader,_writer} are derivations of bits::serialization_{reader,writer}_base, and the two base classes implement read_block and write_block which the stream classes use more or less as a black box.

Compressed serialization streams should ideally be implemented to use the compressor thread, passing in read and write requests which support both forward and backward reading -- exactly what the serialization_reverse_reader needs.

Perhaps process_read_request and process_write_request are a good place to start learning how the compressed streams work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Snappy compression for serialization_sorter.h ??? #186

Snappy compression for serialization_sorter.h ??? #186

hendrikmuhs commented May 21, 2015

antialize commented May 21, 2015

svendcs commented May 21, 2015

Mortal commented May 22, 2015

Snappy compression for serialization_sorter.h ??? #186

Snappy compression for serialization_sorter.h ??? #186

Comments

hendrikmuhs commented May 21, 2015

antialize commented May 21, 2015

svendcs commented May 21, 2015

Mortal commented May 22, 2015