-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TorchAO speedup metric vs eager #6178
Conversation
@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel. A member of the Team first needs to authorize it. |
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Thanks @huydhn , I feel we should also show compile v.s. eager since we have a different set of models compared to PT2 inductor dashboard. replied in DM for an example picture |
I have added a new chart for compile vs eager. However, squeezing them all into one chart is tricker than I expect, so I think I'll stack 3 of them on top of each other for now and create an issue to figure out a way to do it later. When we have more data, all the data points from the 3 charts will line up, giving the impression of being in one chart I hope. The fundamental problem of squeezing them into one chart is that they are 3 different series because the 3 speedup values are calculate separately. And the current HUD chart implementation accepts only one series at a time. So, a proper implementation would likely require re-write / extend HUD chart implementation to accept multiple series. It's not a small task, unfortunately. |
I think one point needs further clarification. I'm seeing 2 different methodologies here.
This seems like a major source of confusion to me. |
d33f6cc
to
a3df197
Compare
I see, one way to reduce the data point could be selecting a specific device by default I think |
That won't work unfortunately, it's not about the number of data points, but the number of series, which needs to be 3 for |
yeah (2) is mainly to catch regressions/improvements of the eager and compile baseline that happens overtime if we want to match inductor, we can implement (1) but just keep a separate comparison to a base commit to compare (3 curves) this way we can understand the perf improvements of the end user that (1) test eager mode performance at base commit from (2). test eager, compile, autoquant performance again now throughout time |
yeah this makes sense |
This reverts commit a3df197.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
Addresses the first part of #6176
This PR adds another speedup metric vs eager. Because this is TorchAO dashboard, I think it's more appropriate to show TorchAO vs compile and TorchAO vs eager instead of TorchAO vs compile and compile vs eager because the last one (compile vs eager) is a fit for PT2 inductor dashboard instead. @jerryzh168 What do you think?
I also fix another UX issue to show the oldest commit in the time range as the base commit instead.
Testing
https://torchci-git-fork-huydhn-improve-ao-speedup-metric-fbopensource.vercel.app/benchmark/llms?startTime=Thu%2C%2009%20Jan%202025%2010%3A21%3A42%20GMT&stopTime=Thu%2C%2016%20Jan%202025%2010%3A21%3A42%20GMT&granularity=day&lBranch=main&lCommit=2cddc67fe700579043e3e2d395d983764298b82e9746e9b2663c583710b3b08c&rBranch=main&rCommit=399034112cd82562f0d651bda8a8b5ab8840703ee0b40cd136d85181164d2280&repoName=pytorch%2Fao&modelName=All%20Models&backendName=All%20Backends&dtypeName=All%20DType&deviceName=All%20Devices