Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use wider loads/stores in Writer on aarch64 #196

Merged
merged 3 commits into from
Sep 26, 2024
Merged

Conversation

folkertdev
Copy link
Collaborator

At least on the raspberri pi, this brings our performance on-par with zlib-ng

Benchmark 1 (48 runs): ./target/release/examples/blogpost-uncompress ng silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           105ms ± 6.80ms     103ms …  151ms          7 (15%)        0%
  peak_rss           23.9MB ± 47.5KB    23.8MB … 23.9MB          0 ( 0%)        0%
  cpu_cycles          121M  ± 1.54M      119M  …  124M          10 (21%)        0%
  instructions        148M  ±  359       148M  …  148M           0 ( 0%)        0%
  cache_references   33.9M  ± 9.34K     33.9M  … 33.9M           8 (17%)        0%
  cache_misses        864K  ±  109K      787K  … 1.10M           9 (19%)        0%
  branch_misses      1.14M  ± 1.05K     1.14M  … 1.15M           1 ( 2%)        0%
Benchmark 2 (47 runs): ./target/release/examples/blogpost-uncompress rs silesia-small.tar.gz
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           107ms ± 5.01ms     105ms …  140ms          1 ( 2%)          +  1.9% ±  2.3%
  peak_rss           23.9MB ± 44.2KB    23.8MB … 23.9MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles          124M  ± 1.62M      123M  …  128M           0 ( 0%)        💩+  3.3% ±  0.5%
  instructions        190M  ±  349       190M  …  190M           0 ( 0%)        💩+ 28.8% ±  0.0%
  cache_references   31.7M  ± 4.20K     31.7M  … 31.7M           0 ( 0%)        ⚡-  6.4% ±  0.0%
  cache_misses        913K  ±  119K      822K  … 1.16M           0 ( 0%)          +  5.7% ±  5.4%
  branch_misses      1.17M  ± 1.72K     1.17M  … 1.18M           1 ( 2%)        💩+  2.8% ±  0.1%

@folkertdev folkertdev merged commit f7c2f01 into main Sep 26, 2024
18 checks passed
@folkertdev folkertdev deleted the neon-copy-match branch September 26, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant