Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(parser): reduce Token size to 8 bytes from 16 #8153

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

branchseer
Copy link
Contributor

@branchseer branchseer commented Dec 28, 2024

  • Replace end: u32 with len: u16. Ends of long tokens (which are rare) are stored in lexer.long_token_ends;
  • Pack bools into bitflags;
  • Now that end is calculated from start + len, start must be properly set. In some places they were not. This PR fixes them and adds a debug-assertion check in the lexer.

Copy link

graphite-app bot commented Dec 28, 2024

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

@github-actions github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels Dec 28, 2024
@branchseer branchseer changed the title perf(parser): reduce Token size to 8 bytes from 12 perf(parser): reduce Token size to 8 bytes from 16 Dec 28, 2024
Copy link

codspeed-hq bot commented Dec 28, 2024

CodSpeed Performance Report

Merging #8153 will degrade performances by 13.22%

Comparing branchseer:token_eight_bytes (0024247) with main (63eb298)

Summary

❌ 5 regressions
✅ 27 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main branchseer:token_eight_bytes Change
lexer[RadixUIAdoptionSection.jsx] 20.4 µs 22.5 µs -9.29%
lexer[antd.js] 22 ms 25 ms -12.1%
lexer[cal.com.tsx] 5.5 ms 6.3 ms -12.64%
lexer[checker.ts] 13.2 ms 14.8 ms -10.7%
lexer[pdf.mjs] 3.6 ms 4.1 ms -13.22%

@Boshen Boshen marked this pull request as draft December 28, 2024 06:03
@branchseer
Copy link
Contributor Author

My local bench run shows the same regression on lexer, but also shows noticeable improvements on parser.

I guess the lexer regression makes sense since the lexer now does more calculation but barely copies tokens on it own.

Here's my local bench result of parser:

cargo bench --bench parser --no-default-features --features parser -- --baseline arm64
    Finished `bench` profile [optimized] target(s) in 0.30s
     Running benches/parser.rs (target/release/deps/parser-63bb1ee5f39a6738)
parser/checker.ts       time:   [9.0814 ms 9.1062 ms 9.1461 ms]
                        change: [−2.0830% −1.7180% −1.2430%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe
parser/cal.com.tsx      time:   [5.0674 ms 5.0756 ms 5.0841 ms]
                        change: [−3.0402% −2.8004% −2.5644%] (p = 0.00 < 0.05)
                        Performance has improved.
parser/RadixUIAdoptionSection.jsx
                        time:   [6.1189 µs 6.1512 µs 6.2121 µs]
                        change: [−7.3368% −6.5958% −5.7452%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  6 (6.00%) high mild
  6 (6.00%) high severe
parser/pdf.mjs          time:   [2.9626 ms 2.9652 ms 2.9681 ms]
                        change: [−0.8735% −0.7438% −0.6150%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
parser/antd.js          time:   [18.797 ms 18.815 ms 18.835 ms]
                        change: [−1.3170% −1.1375% −0.9604%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

I suspected it was a cpu arch thing so I ran the parser bench under rosetta 2, only to see even more improvements:

cargo bench --bench parser --no-default-features --features parser --target x86_64-apple-darwin -- --baseline x86_64
    Finished `bench` profile [optimized] target(s) in 0.11s
     Running benches/parser.rs (target/x86_64-apple-darwin/release/deps/parser-fbba37f46cde4093)
Benchmarking parser/checker.ts: Collecting 100 samples in estimated 5.4792 s (400parser/checker.ts       time:   [13.672 ms 13.703 ms 13.739 ms]
                        change: [−2.7911% −2.4699% −2.1505%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  7 (7.00%) high mild
  6 (6.00%) high severe
Benchmarking parser/cal.com.tsx: Collecting 100 samples in estimated 5.0544 s (70parser/cal.com.tsx      time:   [7.2164 ms 7.2259 ms 7.2374 ms]
                        change: [−4.1010% −3.9361% −3.7532%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
Benchmarking parser/RadixUIAdoptionSection.jsx: Collecting 100 samples in estimatparser/RadixUIAdoptionSection.jsx
                        time:   [10.517 µs 10.525 µs 10.533 µs]
                        change: [−14.409% −14.120% −13.846%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
Benchmarking parser/pdf.mjs: Collecting 100 samples in estimated 5.3237 s (1200 iparser/pdf.mjs          time:   [4.4398 ms 4.4467 ms 4.4581 ms]
                        change: [−2.3575% −2.1154% −1.8130%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  6 (6.00%) high mild
  7 (7.00%) high severe
Benchmarking parser/antd.js: Collecting 100 samples in estimated 5.4254 s (200 itparser/antd.js          time:   [27.019 ms 27.038 ms 27.062 ms]
                        change: [−3.3556% −3.2324% −3.1183%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Now I'm lost. @overlookmotel any insight on this?

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 8, 2025

Thanks for investigating further. I did spend a couple of hours looking at this yesterday and scratching my head. I had to stop because I could feel a rabbit hole coming on and I had other tasks I needed to get on with!

Please give me a few days to mull it over and I'll come back to you with some ideas.

Also, #8298 may have an effect, as lots of work on Token involves converting it to Span. I have some stuff to investigate on that PR before it's ready to merge, but once it is merged, it may affect perf on this PR too (hopefully positively!).

One question in meantime:

I suspected it was a cpu arch thing so I ran the parser bench under rosetta 2, only to see even more improvements

What effect did you expect Rosetta 2 to have? Rosetta is an x86_64 emulator, right? (just checking I do know what I think I know!)

@overlookmotel overlookmotel self-requested a review January 8, 2025 13:58
@branchseer
Copy link
Contributor Author

Yeah I ran Rosetta 2 to check the bench result under x86_64. It was my wishful thinking that if Rosetta 2 gave the same result as codespeed, that would prove the improvements occur only on specific cpu archs (apple arm64).

@Boshen
Copy link
Member

Boshen commented Jan 15, 2025

We won't be able to merge this PR due to conflicting results from the benchmark, but I can merge the token API change so that we can focus on changing the token shape next time.

@overlookmotel
Copy link
Contributor

Personally, I don't think we should merge until we have confirmation that the perf regression that's showing on CodSpeed does not also affect "real" x86_64 (Rosetta is not x86, but an ARM chip pretending to be x86, so I'm not sure if it's representative). Also I would like to finish up #8298 and see what this looks like on top of that.

I do have this on my radar, please give me a couple of days.

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 17, 2025

Not forgotten you! I found a problem with lexer benchmarks, which may affect this. Will be interesting to see what this looks like once #8573 and #8298 are merged.

@overlookmotel
Copy link
Contributor

Those 2 PRs are now merged. @branchseer Could you please rebase this on latest main so we can see if benchmarks shift at all?

By the way, I tried a few optimizations to Token myself, but got similarly weird results - +2% on lexer benchmarks, but -1% on parser benchmarks on #8576. 🤷

# Conflicts:
#	crates/oxc_parser/src/lexer/token.rs
# Conflicts:
#	crates/oxc_parser/src/modifiers.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants