Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: rework decoder interface #22

Merged
merged 1 commit into from
Oct 15, 2023
Merged

perf: rework decoder interface #22

merged 1 commit into from
Oct 15, 2023

Conversation

ianprime0509
Copy link
Owner

The updated interface decodes codepoints directly from a reader rather than being implemented as a state machine. This turns out to be considerably more efficient than the previous implementation, with around 25% improvement on the token_reader and reader benchmarks:

Benchmark 1 (27 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           188ms ± 14.5ms     168ms …  205ms          0 ( 0%)        0%
  peak_rss           7.31MB ± 58.5KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          688M  ± 4.20M      684M  …  706M           1 ( 4%)        0%
  instructions       1.19G  ± 29.4      1.19G  … 1.19G           0 ( 0%)        0%
  cache_references    412K  ±  763K      239K  … 4.21M           2 ( 7%)        0%
  cache_misses       10.0K  ± 7.40K     7.90K  … 46.8K           2 ( 7%)        0%
  branch_misses       814K  ± 1.37K      813K  …  821K           1 ( 4%)        0%
Benchmark 2 (37 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           136ms ± 13.8ms     115ms …  147ms          0 ( 0%)        ⚡- 27.7% ±  3.8%
  peak_rss           7.31MB ± 54.7KB    7.21MB … 7.34MB          8 (22%)          +  0.1% ±  0.4%
  cpu_cycles          462M  ± 1.87M      459M  …  466M           0 ( 0%)        ⚡- 32.8% ±  0.2%
  instructions       1.14G  ± 26.6      1.14G  … 1.14G           0 ( 0%)        ⚡-  4.1% ±  0.0%
  cache_references    236K  ± 4.86K      227K  …  244K           0 ( 0%)          - 42.7% ± 60.7%
  cache_misses       9.40K  ± 1.25K     7.88K  … 11.5K           0 ( 0%)          -  6.5% ± 24.6%
  branch_misses       815K  ± 1.01K      813K  …  817K           0 ( 0%)          +  0.1% ±  0.1%
Benchmark 1 (23 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           225ms ± 14.2ms     199ms …  249ms          0 ( 0%)        0%
  peak_rss           7.25MB ±  100KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          823M  ± 12.2M      813M  …  847M           0 ( 0%)        0%
  instructions       1.43G  ± 23.0      1.43G  … 1.43G           0 ( 0%)        0%
  cache_references    757K  ±  129K      635K  … 1.07M           1 ( 4%)        0%
  cache_misses       13.7K  ± 1.18K     12.5K  … 17.2K           2 ( 9%)        0%
  branch_misses      1.43M  ± 3.35K     1.42M  … 1.43M           0 ( 0%)        0%
Benchmark 2 (31 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           166ms ± 13.9ms     144ms …  175ms          0 ( 0%)        ⚡- 26.5% ±  3.4%
  peak_rss           7.27MB ± 81.8KB    7.08MB … 7.34MB          0 ( 0%)          +  0.3% ±  0.7%
  cpu_cycles          581M  ± 1.54M      579M  …  584M           0 ( 0%)        ⚡- 29.4% ±  0.5%
  instructions       1.38G  ± 16.0      1.38G  … 1.38G           9 (29%)        ⚡-  3.8% ±  0.0%
  cache_references    715K  ±  219K      563K  … 1.71M           3 (10%)          -  5.5% ± 13.6%
  cache_misses       13.5K  ± 1.31K     11.4K  … 16.5K           2 ( 6%)          -  1.2% ±  5.1%
  branch_misses      1.07M  ± 20.3K     1.05M  … 1.11M           5 (16%)        ⚡- 25.3% ±  0.6%

The updated interface decodes codepoints directly from a reader rather
than being implemented as a state machine. This turns out to be
considerably more efficient than the previous implementation, with
around 25% improvement on the `token_reader` and `reader` benchmarks:

```
Benchmark 1 (27 runs): zig-out/bin-old/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           188ms ± 14.5ms     168ms …  205ms          0 ( 0%)        0%
  peak_rss           7.31MB ± 58.5KB    7.21MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          688M  ± 4.20M      684M  …  706M           1 ( 4%)        0%
  instructions       1.19G  ± 29.4      1.19G  … 1.19G           0 ( 0%)        0%
  cache_references    412K  ±  763K      239K  … 4.21M           2 ( 7%)        0%
  cache_misses       10.0K  ± 7.40K     7.90K  … 46.8K           2 ( 7%)        0%
  branch_misses       814K  ± 1.37K      813K  …  821K           1 ( 4%)        0%
Benchmark 2 (37 runs): zig-out/bin/token_reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           136ms ± 13.8ms     115ms …  147ms          0 ( 0%)        ⚡- 27.7% ±  3.8%
  peak_rss           7.31MB ± 54.7KB    7.21MB … 7.34MB          8 (22%)          +  0.1% ±  0.4%
  cpu_cycles          462M  ± 1.87M      459M  …  466M           0 ( 0%)        ⚡- 32.8% ±  0.2%
  instructions       1.14G  ± 26.6      1.14G  … 1.14G           0 ( 0%)        ⚡-  4.1% ±  0.0%
  cache_references    236K  ± 4.86K      227K  …  244K           0 ( 0%)          - 42.7% ± 60.7%
  cache_misses       9.40K  ± 1.25K     7.88K  … 11.5K           0 ( 0%)          -  6.5% ± 24.6%
  branch_misses       815K  ± 1.01K      813K  …  817K           0 ( 0%)          +  0.1% ±  0.1%
```

```
Benchmark 1 (23 runs): zig-out/bin-old/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           225ms ± 14.2ms     199ms …  249ms          0 ( 0%)        0%
  peak_rss           7.25MB ±  100KB    7.08MB … 7.34MB          0 ( 0%)        0%
  cpu_cycles          823M  ± 12.2M      813M  …  847M           0 ( 0%)        0%
  instructions       1.43G  ± 23.0      1.43G  … 1.43G           0 ( 0%)        0%
  cache_references    757K  ±  129K      635K  … 1.07M           1 ( 4%)        0%
  cache_misses       13.7K  ± 1.18K     12.5K  … 17.2K           2 ( 9%)        0%
  branch_misses      1.43M  ± 3.35K     1.42M  … 1.43M           0 ( 0%)        0%
Benchmark 2 (31 runs): zig-out/bin/reader Gtk-4.0.gir
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           166ms ± 13.9ms     144ms …  175ms          0 ( 0%)        ⚡- 26.5% ±  3.4%
  peak_rss           7.27MB ± 81.8KB    7.08MB … 7.34MB          0 ( 0%)          +  0.3% ±  0.7%
  cpu_cycles          581M  ± 1.54M      579M  …  584M           0 ( 0%)        ⚡- 29.4% ±  0.5%
  instructions       1.38G  ± 16.0      1.38G  … 1.38G           9 (29%)        ⚡-  3.8% ±  0.0%
  cache_references    715K  ±  219K      563K  … 1.71M           3 (10%)          -  5.5% ± 13.6%
  cache_misses       13.5K  ± 1.31K     11.4K  … 16.5K           2 ( 6%)          -  1.2% ±  5.1%
  branch_misses      1.07M  ± 20.3K     1.05M  … 1.11M           5 (16%)        ⚡- 25.3% ±  0.6%
```
@ianprime0509 ianprime0509 marked this pull request as ready for review October 15, 2023 02:11
@ianprime0509 ianprime0509 merged commit 9c6389d into main Oct 15, 2023
3 checks passed
@ianprime0509 ianprime0509 deleted the perf/encoding branch October 15, 2023 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant