-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: GPU benchmarks for ML workloads. #160
Comments
Thanks for this request! We are actually currently working on LLM inference speed benchmarks, which I was hoping to ship in a week or so, but we hit a problem with We also have plans to support other benchmarks, e.g. we started GBM model training benchmarks on CPU and GPU as well following @szilard's related benchmarks, but that was put back to the backlog due to other priorities. I think we can pick it up after the above-mentioned LLM-speed updates. |
Great to hear that! Looking forward to see the data once it is ready. P.S. I will add two more feature requests. Maybe you are also already working on those but just wanted to mention that there is interest for them. |
A PR for the LLM inference speed benchmarks is now open at SpareCores/sc-images#1 -- @ywilke, could you please take a look? I'd love to hear any feedback, e.g. if the covered benchmark scenarios (prompt processing with 16-32k tokens and text generation with 1-8k tokens), the selected models etc seem to be useful, or would you have any recommendations. |
That’s great to see! Unfortunately, I don’t have much insight into how specific ML workloads perform better on one GPU versus another. From what I’ve heard, workloads are often constrained by either memory or compute, but I’m not sure how much that applies differently to LLMs compared to models like ResNet. It would also be interesting to see a distinction between training and inference speed—are there significant differences in how these bottlenecks manifest for LLM inference? Looking forward to the benchmarks! |
I love what you are doing at sparecores!
I was trying to find good benchmarks that compare the GPU instances that different cloud providers offer. Unfortunately I was unable to find any good comparison of the performance for ML workloads between the different GPU instances. Even when I tried the find a comparison between the GPUs that the cloud instances contain I could not find a good benchmark that compares them.
It would be great if you could add some GPU benchmarks to sparecores that would test common ML workloads like LLM/resnet training/inference. I am not sure if there is already a good benchmark suite that you could run.
Maybe this repository is not the correct place for feature requests. Let me know if you want to move it somewhere else.
The text was updated successfully, but these errors were encountered: