Feature request: GPU benchmarks for ML workloads. #160

ywilke · 2025-01-14T09:19:03Z

I love what you are doing at sparecores!

I was trying to find good benchmarks that compare the GPU instances that different cloud providers offer. Unfortunately I was unable to find any good comparison of the performance for ML workloads between the different GPU instances. Even when I tried the find a comparison between the GPUs that the cloud instances contain I could not find a good benchmark that compares them.
It would be great if you could add some GPU benchmarks to sparecores that would test common ML workloads like LLM/resnet training/inference. I am not sure if there is already a good benchmark suite that you could run.

Maybe this repository is not the correct place for feature requests. Let me know if you want to move it somewhere else.

daroczig · 2025-01-14T11:49:25Z

Thanks for this request! We are actually currently working on LLM inference speed benchmarks, which I was hoping to ship in a week or so, but we hit a problem with llama-bench from llama.cpp scaling to multiple GPUs [ggerganov/llama.cpp/discussions/11236]. We will see if we can resolve it, or we might need to write up custom benchmarking scripts supporting both CPU and (multi)GPU use cases from tiny to larger models. I will let you know about this here.

We also have plans to support other benchmarks, e.g. we started GBM model training benchmarks on CPU and GPU as well following @szilard's related benchmarks, but that was put back to the backlog due to other priorities. I think we can pick it up after the above-mentioned LLM-speed updates.

ywilke · 2025-01-14T13:46:31Z

Great to hear that! Looking forward to see the data once it is ready.

P.S. I will add two more feature requests. Maybe you are also already working on those but just wanted to mention that there is interest for them.

daroczig · 2025-01-26T23:36:07Z

A PR for the LLM inference speed benchmarks is now open at SpareCores/sc-images#1 -- @ywilke, could you please take a look?

I'd love to hear any feedback, e.g. if the covered benchmark scenarios (prompt processing with 16-32k tokens and text generation with 1-8k tokens), the selected models etc seem to be useful, or would you have any recommendations.

ywilke · 2025-01-29T09:21:38Z

That’s great to see! Unfortunately, I don’t have much insight into how specific ML workloads perform better on one GPU versus another. From what I’ve heard, workloads are often constrained by either memory or compute, but I’m not sure how much that applies differently to LLMs compared to models like ResNet. It would also be interesting to see a distinction between training and inference speed—are there significant differences in how these bottlenecks manifest for LLM inference? Looking forward to the benchmarks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: GPU benchmarks for ML workloads. #160

Feature request: GPU benchmarks for ML workloads. #160

ywilke commented Jan 14, 2025

daroczig commented Jan 14, 2025 •

edited

Loading

ywilke commented Jan 14, 2025

daroczig commented Jan 26, 2025

ywilke commented Jan 29, 2025

Feature request: GPU benchmarks for ML workloads. #160

Feature request: GPU benchmarks for ML workloads. #160

Comments

ywilke commented Jan 14, 2025

daroczig commented Jan 14, 2025 • edited Loading

ywilke commented Jan 14, 2025

daroczig commented Jan 26, 2025

ywilke commented Jan 29, 2025

daroczig commented Jan 14, 2025 •

edited

Loading