Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dsv3 mi300 triton config for block scale #3146

Merged
merged 41 commits into from
Jan 27, 2025

Conversation

BruceXcluding
Copy link
Contributor

@BruceXcluding BruceXcluding commented Jan 26, 2025

Motivation

Tune MoE:

python benchmark/kernels/fused_moe_triton/tuning_fused_moe_triton.py --model deepseek-ai/DeepSeek-V3 --tp-size 8 --dtype fp8_w8a8 --tune

The offline perf would get a 20% uplift with configs. server perf no obvious change.

Modifications

Checklist

@BruceXcluding
Copy link
Contributor Author

@HaiShaw Please check

@zhyncs zhyncs requested a review from HandH1998 January 26, 2025 09:37
@HaiShaw
Copy link
Collaborator

HaiShaw commented Jan 26, 2025

Maybe fix following text/example usage to what is to be in main?

Gemm Tune with private repo
export TRITON_PRINT_AUTOTUNING=1
python tune_gemm.py

Copy link
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after minors addressed, thanks!

@BruceXcluding
Copy link
Contributor Author

Maybe fix following text/example usage to what is to be in main?

Gemm Tune with private repo
export TRITON_PRINT_AUTOTUNING=1
python tune_gemm.py

I'll remove this part of the description. If a tune_gemm example is added in the main, I will add rocm configs.

@merrymercy
Copy link
Contributor

ready to be merged?

@zhyncs zhyncs merged commit 351a72d into sgl-project:main Jan 27, 2025
9 of 15 checks passed
chongli-uw pushed a commit to chongli-uw/sglang that referenced this pull request Feb 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants