Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Block pointer GEMM performance #3328

Open
alexbaden opened this issue Feb 1, 2025 · 0 comments
Open

Improve Block pointer GEMM performance #3328

alexbaden opened this issue Feb 1, 2025 · 0 comments

Comments

@alexbaden
Copy link
Contributor

Comparing a simple GEMM kernel using block pointers with PyTorch (oneDNN) on PVC 1100:

gbps tflops Triton % Torch tflops
Torch A x B 220.7031745 136.4162522
Triton A x B 174.9541537 108.8349622 79.78%
Torch A x B^T 203.4142925 126.5652739
Triton A x B^T 104.0757365 64.59003281 51.03%
Torch A^T x B 218.4317726 135.8922436
Triton A^T x B 42.5237584 26.39050218 19.42%
Torch A^T x B^T 150.3182902 93.28844194
Triton A^T x B^T 34.43304927 21.36935906 22.91%

(captured 2025 Feb 1)

LIBIGC1_VERSION=2.5.11-1077
LEVEL_ZERO_VERSION=1.19.2.0-1076
AGAMA_VERSION=1077
GPU_DEVICE=Intel(R) Data Center GPU Max 1100
TORCH_VERSION=2.7.0
COMPILER_VERSION=2025.0.4

Umbrella issue to track improvements for each of the four individual GEMM kernels above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant