Skip to content

Commit

Permalink
Port ARM inflate performance improvement patches (chunk SIMD, read64l…
Browse files Browse the repository at this point in the history
…e) (cloudflare#22)

* When windowBits is zero, the size of the sliding window comes from

the zlib header.  The allowed values of the four-bit field are
0..7, but when windowBits is zero, values greater than 7 are
permitted and acted upon, resulting in large, mostly unused memory
allocations.  This fix rejects such invalid zlib headers.

* Add option to not compute or check check values.

The undocumented (except in these commit comments) function
inflateValidate(strm, check) can be called after an inflateInit(),
inflateInit2(), or inflateReset2() with check equal to zero to
turn off the check value (CRC-32 or Adler-32) computation and
comparison. Calling with check not equal to zero turns checking
back on. This should only be called immediately after the init or
reset function. inflateReset() does not change the state, so a
previous inflateValidate() setting will remain in effect.

This also turns off validation of the gzip header CRC when
present.

This should only be used when a zlib or gzip stream has already
been checked, and repeated decompressions of the same stream no
longer need to be validated.

* This verifies that the state has been initialized, that it is the

expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.

* Use macros to represent magic numbers

This combines two patches which help in improving the readability and
maintainability of the code by making magic numbers into #defines.

Based on Chris Blume's (cblume@chromium) patches for zlib chromium:
8888511 - "Zlib: Use defines for inffast"
b9c1566 - "Share inffast names in zlib"

These patches are needed when introducing chunk SIMD NEON enchancements.

Signed-off-by: Janakarajan Natarajan <[email protected]>

* Port inflate chunk SIMD NEON patches for cloudflare

Based on 2 patches from zlib chromium fork:

* Adenilson Cavalcanti ([email protected])
  3060dcb - "zlib: inflate using wider loads and stores"

* Noel Gordon ([email protected])
  64ffef0 - "Improve zlib inflate speed by using SSE2 chunk copy

The two patches combined provide around 5-25% increase in inflate
performance, based on the workload, when checked with a modified
zpipe.c and the Silesia corpus.

Signed-off-by: Janakarajan Natarajan <[email protected]>

* Increase inflate speed: read decode input into a uint64_t

Update the chunk-copy code with a wide input data reader, which consumes
input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the
wide reader.

Based on Noel Gordon's ([email protected]) patch for the zlib chromium fork
8a8edc1 - "Increase inflate speed: read decoder input into a uint64_t"

This patch provides 7-10% inflate performance improvement when tested with a
modified zpipe.c and the Silesia corpus.

Signed-off-by: Janakarajan Natarajan <[email protected]>

Co-authored-by: Mark Adler <[email protected]>
  • Loading branch information
janaknat and madler authored Sep 23, 2020
1 parent c43185e commit e76d32d
Show file tree
Hide file tree
Showing 12 changed files with 1,037 additions and 66 deletions.
13 changes: 10 additions & 3 deletions Makefile.in
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,12 @@ OBJG = compress.o uncompr.o gzclose.o gzlib.o gzread.o gzwrite.o

PIC_OBJZ = adler32.lo adler32_simd.lo crc32.lo deflate.lo infback.lo inffast.lo inflate.lo inftrees.lo trees.lo zutil.lo
PIC_OBJG = compress.lo uncompr.lo gzclose.lo gzlib.lo gzread.lo gzwrite.lo


ifneq ($(findstring -DINFLATE_CHUNK_SIMD_NEON, $(CFLAGS)),)
OBJZ += inffast_chunk.o
PIC_OBJZ += inffast_chunk.lo
endif

ifneq ($(findstring -DHAS_PCLMUL, $(CFLAGS)),)
OBJZ += crc32_simd.o
PIC_OBJZ += crc32_simd.lo
Expand Down Expand Up @@ -277,8 +282,9 @@ gzclose.o gzlib.o gzread.o gzwrite.o: zlib.h zconf.h gzguts.h
compress.o example.o minigzip.o uncompr.o: zlib.h zconf.h
crc32.o: zutil.h zlib.h zconf.h crc32.h
deflate.o: deflate.h zutil.h zlib.h zconf.h
infback.o inflate.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h inffixed.h
infback.o inflate.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h inffixed.h inffast_chunk.h chunkcopy.h
inffast.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h
inffast_chunk.o: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast_chunk.h chunkcopy.h
inftrees.o: zutil.h zlib.h zconf.h inftrees.h
trees.o: deflate.h zutil.h zlib.h zconf.h trees.h

Expand All @@ -288,7 +294,8 @@ gzclose.lo gzlib.lo gzread.lo gzwrite.lo: zlib.h zconf.h gzguts.h
compress.lo example.lo minigzip.lo uncompr.lo: zlib.h zconf.h
crc32.lo: zutil.h zlib.h zconf.h crc32.h
deflate.lo: deflate.h zutil.h zlib.h zconf.h
infback.lo inflate.lo: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h inffixed.h
infback.lo inflate.lo: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h inffixed.h inffast_chunk.h chunkcopy.h
inffast.lo: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast.h
inffast_chunk.lo: zutil.h zlib.h zconf.h inftrees.h inflate.h inffast_chunk.h chunkcopy.h
inftrees.lo: zutil.h zlib.h zconf.h inftrees.h
trees.lo: deflate.h zutil.h zlib.h zconf.h trees.h
Loading

0 comments on commit e76d32d

Please sign in to comment.