Block compression support for CRAM writer #320
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for block compression to the CRAM writer.
The primary features are as follows:
:raw
,:gzip
,:bzip
,:lzma
, and:best
, which automatically selects the compression method with the highest compression rate:gzip
, although the default compression method for each data series may be tuned in the future:best
ds-compressor-overrides
andtag-compressor-overrides
options, respectively (explained below)Block compression is applied along with record encoding rather than after all the records have been fully encoded.
Compressor overrides
To override the compression method for data series and tags, specify the
ds-compressor-overrides
andtag-compressor-overrides
options to the CRAM writer:The full specification of the
ds-compressor-overrides
/tag-compressor-overrides
is somewhat intricate. The description here does not aim to be exhaustive but rather to provide the big picture and offer some practical examples of usage.ds-compressor-overrides
ds-compressor-overrides
is a function that takes a keyword representing a data series and returns a keyword representing a compression method. Here are some examples:BF
andCF
data series with:bzip
and leave the others with their default methods:bzip
The function can also return a set of compression method keywords. In this case, the compression method with the highest compression rate will be selected from the specified methods:
BF
series with both:bzip
and:lzma
, and choose the more efficient oneAdditionally, the function can return another function to further condition the compression method based on the codec:
:external
codec using the:bzip
compression method::len-encoding
of the:byte-array-len
codec using:lzma
and the blocks for the:val-encoding
using:bzip
For more detailed usage, see the test code for
ds-compressor-overrides
.tag-compressor-overrides
tag-compressor-overrides
works similarly tods-compressor-overrides
, but can also add conditions based on tag types by returning a function::bzip
XA:c
tag with:gzip
and theXA:i
tag with:bzip
:val-encoding
of all the tags of typeZ
using:bzip
For more detailed usage, see the test code for
tag-compressor-overrides
.