Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Port ARM inflate performance improvement patches (chunk SIMD, read64l…
…e) (cloudflare#22) * When windowBits is zero, the size of the sliding window comes from the zlib header. The allowed values of the four-bit field are 0..7, but when windowBits is zero, values greater than 7 are permitted and acted upon, resulting in large, mostly unused memory allocations. This fix rejects such invalid zlib headers. * Add option to not compute or check check values. The undocumented (except in these commit comments) function inflateValidate(strm, check) can be called after an inflateInit(), inflateInit2(), or inflateReset2() with check equal to zero to turn off the check value (CRC-32 or Adler-32) computation and comparison. Calling with check not equal to zero turns checking back on. This should only be called immediately after the init or reset function. inflateReset() does not change the state, so a previous inflateValidate() setting will remain in effect. This also turns off validation of the gzip header CRC when present. This should only be used when a zlib or gzip stream has already been checked, and repeated decompressions of the same stream no longer need to be validated. * This verifies that the state has been initialized, that it is the expected type of state, deflate or inflate, and that at least the first several bytes of the internal state have not been clobbered. * Use macros to represent magic numbers This combines two patches which help in improving the readability and maintainability of the code by making magic numbers into #defines. Based on Chris Blume's (cblume@chromium) patches for zlib chromium: 8888511 - "Zlib: Use defines for inffast" b9c1566 - "Share inffast names in zlib" These patches are needed when introducing chunk SIMD NEON enchancements. Signed-off-by: Janakarajan Natarajan <[email protected]> * Port inflate chunk SIMD NEON patches for cloudflare Based on 2 patches from zlib chromium fork: * Adenilson Cavalcanti ([email protected]) 3060dcb - "zlib: inflate using wider loads and stores" * Noel Gordon ([email protected]) 64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy The two patches combined provide around 5-25% increase in inflate performance, based on the workload, when checked with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <[email protected]> * Increase inflate speed: read decode input into a uint64_t Update the chunk-copy code with a wide input data reader, which consumes input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the wide reader. Based on Noel Gordon's ([email protected]) patch for the zlib chromium fork 8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t" This patch provides 7-10% inflate performance improvement when tested with a modified zpipe.c and the Silesia corpus. Signed-off-by: Janakarajan Natarajan <[email protected]> Co-authored-by: Mark Adler <[email protected]>
- Loading branch information