-
Notifications
You must be signed in to change notification settings - Fork 131
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The main benefit of this format for our users is that it decompresses much faster than bzip2, even at high compression levels. At level 19 it compresses even better than bzip2 for our files, hopefully the compression time is still acceptable, if not we can reduce it as to not overwork the server, at the price of some slightly bigger files. On my i7-8700K, unarchiving sentences.tar.bz2 takes 15.5s, compared to 994ms for sentences.csv.zst compressed at level 19. The file is 183 MiB compared to 197 MiB with bzip2. We could go down to 167 MiB with level 22 (which decompresses in 941ms), but compression time starts to get much higher, not sure this is worth it. The only downside I see to this change is that user automation will have to be changed, so perhaps announce it somehow before deploying it. I’ve also removed the tar step, which only added overhead since we only ever created a single archive per file.
- Loading branch information
Showing
5 changed files
with
60 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters