Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TorchAO speedup metric vs eager #6178

Merged
merged 8 commits into from
Jan 16, 2025

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Jan 16, 2025

Addresses the first part of #6176

This PR adds another speedup metric vs eager. Because this is TorchAO dashboard, I think it's more appropriate to show TorchAO vs compile and TorchAO vs eager instead of TorchAO vs compile and compile vs eager because the last one (compile vs eager) is a fit for PT2 inductor dashboard instead. @jerryzh168 What do you think?

I also fix another UX issue to show the oldest commit in the time range as the base commit instead.

Testing

https://torchci-git-fork-huydhn-improve-ao-speedup-metric-fbopensource.vercel.app/benchmark/llms?startTime=Thu%2C%2009%20Jan%202025%2010%3A21%3A42%20GMT&stopTime=Thu%2C%2016%20Jan%202025%2010%3A21%3A42%20GMT&granularity=day&lBranch=main&lCommit=2cddc67fe700579043e3e2d395d983764298b82e9746e9b2663c583710b3b08c&rBranch=main&rCommit=399034112cd82562f0d651bda8a8b5ab8840703ee0b40cd136d85181164d2280&repoName=pytorch%2Fao&modelName=All%20Models&backendName=All%20Backends&dtypeName=All%20DType&deviceName=All%20Devices

@huydhn huydhn requested a review from jerryzh168 January 16, 2025 04:15
Copy link

vercel bot commented Jan 16, 2025

@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025
Copy link

vercel bot commented Jan 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview Jan 16, 2025 10:38pm

@jerryzh168
Copy link

Thanks @huydhn , I feel we should also show compile v.s. eager since we have a different set of models compared to PT2 inductor dashboard. replied in DM for an example picture

@huydhn
Copy link
Contributor Author

huydhn commented Jan 16, 2025

I have added a new chart for compile vs eager. However, squeezing them all into one chart is tricker than I expect, so I think I'll stack 3 of them on top of each other for now and create an issue to figure out a way to do it later. When we have more data, all the data points from the 3 charts will line up, giving the impression of being in one chart I hope.

The fundamental problem of squeezing them into one chart is that they are 3 different series because the 3 speedup values are calculate separately. And the current HUD chart implementation accepts only one series at a time. So, a proper implementation would likely require re-write / extend HUD chart implementation to accept multiple series. It's not a small task, unfortunately.

@huydhn
Copy link
Contributor Author

huydhn commented Jan 16, 2025

I think one point needs further clarification. I'm seeing 2 different methodologies here.

  1. PT2 inductor dashboard compares torch.compile vs eager on the same commit. Effectively, eager speedup is always 1 and if torch.compile couldn't beat that with a speedup larger than 1, people will fallback to eager. The geomean chart on https://hud.pytorch.org/benchmark/compilers is implemented this way.
  2. The second approach is to have torch.compile v.s. eager on the base commit, which is not what we use atm. So, if we have that here, speedup on AO dashboard would have a different meaning than speedup on PT2 inductor dashboard.

This seems like a major source of confusion to me. So, I'm still using the same approach as what is in PT2 inductor dashboard here in this current iteration. Note that we can use the value from the base commit and implement (2) just for TorchAO. I have implemented (2) in this current iteration but want to make sure that it's the sound approach to have.

@jerryzh168
Copy link

I have added a new chart for compile vs eager. However, squeezing them all into one chart is tricker than I expect, so I think I'll stack 3 of them on top of each other for now and create an issue to figure out a way to do it later. When we have more data, all the data points from the 3 charts will line up, giving the impression of being in one chart I hope.

The fundamental problem of squeezing them into one chart is that they are 3 different series because the 3 speedup values are calculate separately. And the current HUD chart implementation accepts only one series at a time. So, a proper implementation would likely require re-write / extend HUD chart implementation to accept multiple series. It's not a small task, unfortunately.

I see, one way to reduce the data point could be selecting a specific device by default I think

@huydhn
Copy link
Contributor Author

huydhn commented Jan 16, 2025

I see, one way to reduce the data point could be selecting a specific device by default I think

That won't work unfortunately, it's not about the number of data points, but the number of series, which needs to be 3 for compile_vs_eager, autoquant_vs_compile, autoquant_vs_compile. The current HUD chart implementation doesn't even work with 2. So, that's the issue is about.

@jerryzh168
Copy link

I think one point needs further clarification. I'm seeing 2 different methodologies here.

  1. PT2 inductor dashboard compares torch.compile vs eager on the same commit. Effectively, eager speedup is always 1 and if torch.compile couldn't beat that with a speedup larger than 1, people will fallback to eager. The geomean chart on hud.pytorch.org/benchmark/compilers is implemented this way.
  2. The second approach is to have torch.compile v.s. eager on the base commit, which is not what we use atm. So, if we have that here, speedup on AO dashboard would have a different meaning than speedup on PT2 inductor dashboard.

This seems like a major source of confusion to me. So, I'm still using the same approach as what is in PT2 inductor dashboard here in this current iteration. Note that we can use the value from the base commit and implement (2) just for TorchAO. I have implemented (2) in this current iteration but want to make sure that it's the sound approach to have.

yeah (2) is mainly to catch regressions/improvements of the eager and compile baseline that happens overtime

if we want to match inductor, we can implement (1) but just keep a separate comparison to a base commit to compare

(3 curves)
(1) current eager perf with base commit eager
(2) compile perf with current eager
(3) autoquant perf on top of current compile

this way we can understand the perf improvements of the end user that (1) test eager mode performance at base commit from (2). test eager, compile, autoquant performance again now throughout time

@jerryzh168
Copy link

I see, one way to reduce the data point could be selecting a specific device by default I think

That won't work unfortunately, it's not about the number of data points, but the number of series, which needs to be 3 for compile_vs_eager, autoquant_vs_compile, autoquant_vs_compile. The current HUD chart implementation doesn't even work with 2. So, that's the issue is about.

yeah this makes sense

Copy link

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@huydhn huydhn merged commit cb2e2d9 into pytorch:main Jan 16, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants