Merge OpenAI Triton commit `b39c1e1` #3302

whitneywhtsang · 2025-01-28T22:43:52Z

This PR change the Triton base from 0ecb172 to b39c1e1 (Jan 28).
Pass rate: 99.44%-> 98.19% (#3307)

Please do not squash and merge this PR.

This PR is to use a more efficient approach for the type conversion from fp32 to bf16 in the hip backend. According to a simple unit test: the number of VGPR used decreases from 18 to 10.

This is a minor change, when implementing PR #5549 I used: ```rewriter.notifyMatchFailure``` in place of ```return failure();``` as per suggestions to leverage MLIR infra for errors. We should probably be consistent throughout the file and use the MLIR infra for the other buffer ops.

https://github.com/triton-lang/triton/pull/5542/files introduced a dependency on an unordered_map without requiring the unordered_map include path. This works on some systems currently if other headers include unordered map, but fails on others.

@depaulmillz

Initial support for Nvidia Blackwell GPUs (sm_100). The key contributions included in this PR are: * Support for 5th generation Tensor Core. * Modeling and support of Tensor Memory. * Native support for microscaling formats mxfp4 and mxfp8. * Improvements to the software pipeliner to take advantage of Tensor Cores and Tensor memory This was developed in close collaboration between Nvidia and OpenAI. From Nvidia: dePaul Miller (@depaulmillz) Samantha Hirsch (@Sam3077) Yujia Zhai (@yzhaiustc) Shang Zhang (@shangz-ai) Pradeep Ramani (@IonThruster) Matthew Brookhart (@mbrookhart) Masahiro Masuda (@masahi) Chris Sullivan (@csullivan) Clive Unger (@CliveUnger) Jason Knight (@binarybana) From OpenAI: Pawel Szczerbuk (@pawelszczerbuk) Peter Bell (@peterbell10) Phil Tillet (@ptillet) Jeff Niu (@jeffniu-openai) Thomas Raoux (@ThomasRaoux) Co-authored-by: Baogang Song <[email protected]> Co-authored-by: Pawel Szczerbuk <[email protected]> Co-authored-by: Sergei Vorobev <[email protected]> Co-authored-by: ionthruster <[email protected]> Co-authored-by: dePaul Miller <[email protected]> Co-authored-by: Matthew Brookhart <[email protected]> Co-authored-by: Chris Sullivan <[email protected]> Co-authored-by: Masahiro Masuda <[email protected]> Co-authored-by: Chris Sullivan <[email protected]> Co-authored-by: peterbell10 <[email protected]> Co-authored-by: jeffniu-openai <[email protected]>

Fix build failure from `b39c1e1`. Signed-off-by: Whitney Tsang <[email protected]>

Signed-off-by: Whitney Tsang <[email protected]>

…ast '/std:c++20'` Signed-off-by: Whitney Tsang <[email protected]>

Signed-off-by: Whitney Tsang <[email protected]>

scxiao and others added 4 commits January 27, 2025 10:08

[AMD] Use more efficient fp32 to bf16 type conversion (#5633)

1c28e08

This PR is to use a more efficient approach for the type conversion from fp32 to bf16 in the hip backend. According to a simple unit test: the number of VGPR used decreases from 18 to 10.

Add missing include header (#5721)

47c730b

https://github.com/triton-lang/triton/pull/5542/files introduced a dependency on an unordered_map without requiring the unordered_map include path. This works on some systems currently if other headers include unordered map, but fails on others.

whitneywhtsang requested review from pbchekin and anmyachev January 28, 2025 22:43

whitneywhtsang self-assigned this Jan 28, 2025

anmyachev approved these changes Jan 28, 2025

View reviewed changes

whitneywhtsang added 3 commits January 29, 2025 18:59

Merge commit '47c730b33a4dfe6d38f9a80b98801917199da94e'

2c900de

Merge commit 'b39c1e14b8f2029bc6a8798e4914d2692edf97d8'

2a7a540

Temporarily add back LLVM::getStackPointer

8545a9f

Fix build failure from `b39c1e1`. Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 4d8da9e to 8545a9f Compare January 29, 2025 19:19

whitneywhtsang marked this pull request as ready for review January 29, 2025 19:19

whitneywhtsang force-pushed the whitneywhtsang/merge branch 2 times, most recently from 8a24902 to 37388dd Compare January 29, 2025 19:44

[WIN] Fix HTTP Error 404: Not Found

d5d889e

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 37388dd to d5d889e Compare January 29, 2025 19:53

[WIN] Fix `error C7555: use of designated initializers requires at le…

d18a065

…ast '/std:c++20'` Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the whitneywhtsang/merge branch 4 times, most recently from 1d45243 to 8852bb3 Compare January 29, 2025 23:38

[TEST] Fix UT and tutorial failures from b39c1e1

3f44826

Signed-off-by: Whitney Tsang <[email protected]>

whitneywhtsang force-pushed the whitneywhtsang/merge branch from 8852bb3 to 3f44826 Compare January 29, 2025 23:42

whitneywhtsang changed the title ~~Merge OpenAI Triton commit 47c730b~~ Merge OpenAI Triton commit b39c1e1 Jan 30, 2025

whitneywhtsang mentioned this pull request Jan 30, 2025

[TEST] Fix test_matmul.py failures #3307

Open

whitneywhtsang merged commit 3f44826 into main Jan 30, 2025
5 checks passed

whitneywhtsang deleted the whitneywhtsang/merge branch January 30, 2025 14:30

whitneywhtsang mentioned this pull request Feb 3, 2025

Merge OpenAI Triton till Feb 1st #3211

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `b39c1e1` #3302

Merge OpenAI Triton commit `b39c1e1` #3302

whitneywhtsang commented Jan 28, 2025 •

edited

Loading

Merge OpenAI Triton commit b39c1e1 #3302

Merge OpenAI Triton commit b39c1e1 #3302

Conversation

whitneywhtsang commented Jan 28, 2025 • edited Loading

Merge OpenAI Triton commit `b39c1e1` #3302

Merge OpenAI Triton commit `b39c1e1` #3302

whitneywhtsang commented Jan 28, 2025 •

edited

Loading