Add TorchAO speedup metric vs eager #6178

huydhn · 2025-01-16T04:15:18Z

Addresses the first part of #6176

This PR adds another speedup metric vs eager. Because this is TorchAO dashboard, I think it's more appropriate to show TorchAO vs compile and TorchAO vs eager instead of TorchAO vs compile and compile vs eager because the last one (compile vs eager) is a fit for PT2 inductor dashboard instead. @jerryzh168 What do you think?

I also fix another UX issue to show the oldest commit in the time range as the base commit instead.

Testing

https://torchci-git-fork-huydhn-improve-ao-speedup-metric-fbopensource.vercel.app/benchmark/llms?startTime=Thu%2C%2009%20Jan%202025%2010%3A21%3A42%20GMT&stopTime=Thu%2C%2016%20Jan%202025%2010%3A21%3A42%20GMT&granularity=day&lBranch=main&lCommit=2cddc67fe700579043e3e2d395d983764298b82e9746e9b2663c583710b3b08c&rBranch=main&rCommit=399034112cd82562f0d651bda8a8b5ab8840703ee0b40cd136d85181164d2280&repoName=pytorch%2Fao&modelName=All%20Models&backendName=All%20Backends&dtypeName=All%20DType&deviceName=All%20Devices

vercel · 2025-01-16T04:15:21Z

@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

vercel · 2025-01-16T04:16:06Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
torchci	✅ Ready (Inspect)	Visit Preview	Jan 16, 2025 10:38pm

jerryzh168 · 2025-01-16T05:12:38Z

Thanks @huydhn , I feel we should also show compile v.s. eager since we have a different set of models compared to PT2 inductor dashboard. replied in DM for an example picture

huydhn · 2025-01-16T08:41:56Z

I have added a new chart for compile vs eager. However, squeezing them all into one chart is tricker than I expect, so I think I'll stack 3 of them on top of each other for now and create an issue to figure out a way to do it later. When we have more data, all the data points from the 3 charts will line up, giving the impression of being in one chart I hope.

The fundamental problem of squeezing them into one chart is that they are 3 different series because the 3 speedup values are calculate separately. And the current HUD chart implementation accepts only one series at a time. So, a proper implementation would likely require re-write / extend HUD chart implementation to accept multiple series. It's not a small task, unfortunately.

huydhn · 2025-01-16T09:27:22Z

I think one point needs further clarification. I'm seeing 2 different methodologies here.

PT2 inductor dashboard compares torch.compile vs eager on the same commit. Effectively, eager speedup is always 1 and if torch.compile couldn't beat that with a speedup larger than 1, people will fallback to eager. The geomean chart on https://hud.pytorch.org/benchmark/compilers is implemented this way.
The second approach is to have torch.compile v.s. eager on the base commit, which is not what we use atm. So, if we have that here, speedup on AO dashboard would have a different meaning than speedup on PT2 inductor dashboard.

This seems like a major source of confusion to me. ~~So, I'm still using the same approach as what is in PT2 inductor dashboard here in this current iteration. Note that we can use the value from the base commit and implement (2) just for TorchAO~~. I have implemented (2) in this current iteration but want to make sure that it's the sound approach to have.

jerryzh168 · 2025-01-16T17:31:07Z

I have added a new chart for compile vs eager. However, squeezing them all into one chart is tricker than I expect, so I think I'll stack 3 of them on top of each other for now and create an issue to figure out a way to do it later. When we have more data, all the data points from the 3 charts will line up, giving the impression of being in one chart I hope.

The fundamental problem of squeezing them into one chart is that they are 3 different series because the 3 speedup values are calculate separately. And the current HUD chart implementation accepts only one series at a time. So, a proper implementation would likely require re-write / extend HUD chart implementation to accept multiple series. It's not a small task, unfortunately.

I see, one way to reduce the data point could be selecting a specific device by default I think

huydhn · 2025-01-16T17:40:40Z

I see, one way to reduce the data point could be selecting a specific device by default I think

That won't work unfortunately, it's not about the number of data points, but the number of series, which needs to be 3 for compile_vs_eager, autoquant_vs_compile, autoquant_vs_compile. The current HUD chart implementation doesn't even work with 2. So, that's the issue is about.

jerryzh168 · 2025-01-16T17:42:43Z

I think one point needs further clarification. I'm seeing 2 different methodologies here.

PT2 inductor dashboard compares torch.compile vs eager on the same commit. Effectively, eager speedup is always 1 and if torch.compile couldn't beat that with a speedup larger than 1, people will fallback to eager. The geomean chart on hud.pytorch.org/benchmark/compilers is implemented this way.

The second approach is to have torch.compile v.s. eager on the base commit, which is not what we use atm. So, if we have that here, speedup on AO dashboard would have a different meaning than speedup on PT2 inductor dashboard.

This seems like a major source of confusion to me. ~~So, I'm still using the same approach as what is in PT2 inductor dashboard here in this current iteration. Note that we can use the value from the base commit and implement (2) just for TorchAO~~. I have implemented (2) in this current iteration but want to make sure that it's the sound approach to have.

yeah (2) is mainly to catch regressions/improvements of the eager and compile baseline that happens overtime

if we want to match inductor, we can implement (1) but just keep a separate comparison to a base commit to compare

(3 curves)
(1) current eager perf with base commit eager
(2) compile perf with current eager
(3) autoquant perf on top of current compile

this way we can understand the perf improvements of the end user that (1) test eager mode performance at base commit from (2). test eager, compile, autoquant performance again now throughout time

jerryzh168 · 2025-01-16T17:43:05Z

I see, one way to reduce the data point could be selecting a specific device by default I think

That won't work unfortunately, it's not about the number of data points, but the number of series, which needs to be 3 for compile_vs_eager, autoquant_vs_compile, autoquant_vs_compile. The current HUD chart implementation doesn't even work with 2. So, that's the issue is about.

yeah this makes sense

This reverts commit a3df197.

jerryzh168

Thanks a lot!

Add TorchAO speedup metric vs eager

5ccf30a

huydhn requested a review from jerryzh168 January 16, 2025 04:15

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025

vercel bot deployed to Preview January 16, 2025 04:17 View deployment

Add compile vs eager chart

e790d38

vercel bot deployed to Preview January 16, 2025 07:31 View deployment

Another small tweak

17d8e51

vercel bot deployed to Preview January 16, 2025 08:22 View deployment

Stack on top

14d60d3

vercel bot deployed to Preview January 16, 2025 08:36 View deployment

vercel bot deployed to Preview January 16, 2025 10:17 View deployment

Compute speedup using the base commit value

a3df197

huydhn force-pushed the improve-ao-speedup-metric branch from d33f6cc to a3df197 Compare January 16, 2025 10:18

vercel bot deployed to Preview January 16, 2025 10:21 View deployment

huydhn added 2 commits January 16, 2025 12:09

Revert "Compute speedup using the base commit value"

ced396a

This reverts commit a3df197.

Add eager speedup chart

a67d80c

vercel bot deployed to Preview January 16, 2025 22:30 View deployment

Change graph order a bit

239731d

jerryzh168 approved these changes Jan 16, 2025

View reviewed changes

vercel bot deployed to Preview January 16, 2025 22:38 View deployment

huydhn merged commit cb2e2d9 into pytorch:main Jan 16, 2025
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TorchAO speedup metric vs eager #6178

Add TorchAO speedup metric vs eager #6178

huydhn commented Jan 16, 2025 •

edited

Loading

vercel bot commented Jan 16, 2025

vercel bot commented Jan 16, 2025 •

edited

Loading

jerryzh168 commented Jan 16, 2025

huydhn commented Jan 16, 2025 •

edited

Loading

huydhn commented Jan 16, 2025 •

edited

Loading

jerryzh168 commented Jan 16, 2025

huydhn commented Jan 16, 2025

jerryzh168 commented Jan 16, 2025

jerryzh168 commented Jan 16, 2025

jerryzh168 left a comment

Add TorchAO speedup metric vs eager #6178

Add TorchAO speedup metric vs eager #6178

Conversation

huydhn commented Jan 16, 2025 • edited Loading

Testing

vercel bot commented Jan 16, 2025

vercel bot commented Jan 16, 2025 • edited Loading

jerryzh168 commented Jan 16, 2025

huydhn commented Jan 16, 2025 • edited Loading

huydhn commented Jan 16, 2025 • edited Loading

jerryzh168 commented Jan 16, 2025

huydhn commented Jan 16, 2025

jerryzh168 commented Jan 16, 2025

jerryzh168 commented Jan 16, 2025

jerryzh168 left a comment

Choose a reason for hiding this comment

huydhn commented Jan 16, 2025 •

edited

Loading

vercel bot commented Jan 16, 2025 •

edited

Loading

huydhn commented Jan 16, 2025 •

edited

Loading

huydhn commented Jan 16, 2025 •

edited

Loading