[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts #23

ggengnv · 2025-02-26T00:18:59Z

Do not merge until rebased on upstream Triton up to triton-lang@c1ed673

This is 1 of the 2 patches needed to improve int4xbf16 GEMM perf.

This improves shmem swizzling when loading into LinearLayouts. This is needed because when using join/reshape, which is needed for efficient int4 upcasting, the propagated layout would be in LinearLayout rather than DotOp layout. Currently Triton falls back to an unswizzled shmem layout in this case, which is suboptimal.

This PR adds high-level heuristics to generate a swizzled layout for the above case.

cc @gflegar

Add shmem swizzling heuristic

4314a38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts #23

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts #23

ggengnv commented Feb 26, 2025

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts #23

Are you sure you want to change the base?

[DRAFT][s4xbf16] Add shmem swizzling heuristics for loading into LinearLayouts #23

Conversation

ggengnv commented Feb 26, 2025