Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GatherV2 #3496

Open
wants to merge 25 commits into
base: develop
Choose a base branch
from
Open

Implement GatherV2 #3496

wants to merge 25 commits into from

Conversation

cognaiger9
Copy link
Collaborator

@cognaiger9 cognaiger9 commented Feb 11, 2025

  • Add GatherV2 operation with non-batched and batched backward kernels.
  • Add driver and gtest for kernels.

Average improvement over ROCm

type bwd
float16 4.65
float 5.59
bfloat16 5.16

Detail Benchmark

float16
op_name dim batch dim dtype param size indices size direction ROCm MIOpen MIOpen vs ROCm
GatherV2 0 0 float16 [2 2 4 6 8] [16 16] bwd 570977 244854 2.33
GatherV2 1 0 float16 [2 2 4 6 8] [16 16] bwd 452338 252338 1.79
GatherV2 2 0 float16 [2 2 4 6 8] [16 16] bwd 341809 105227 3.25
GatherV2 3 0 float16 [2 2 4 6 8] [16 16] bwd 387715 81778 4.74
GatherV2 4 0 float16 [2 2 4 6 8] [16 16] bwd 417489 84355 4.95
GatherV2 0 0 float16 [2 4 8 32 64] [8 8] bwd 411457 146649 2.81
GatherV2 1 0 float16 [2 4 8 32 64] [8 8] bwd 346176 112445 3.08
GatherV2 2 0 float16 [2 4 8 32 64] [8 8] bwd 499682 114063 4.38
GatherV2 0 0 float16 [2 4 8 32 64] [16 64] bwd 3129323 1710280 1.83
GatherV2 1 0 float16 [2 4 8 32 64] [16 64] bwd 1854078 1284180 1.44
GatherV2 4 0 float16 [2 4 8 32 64] [16 64] bwd 2486584 382152 6.51
GatherV2 0 0 float16 [4 16 32 64 64] [8 16] bwd 9583599 5384230 1.78
GatherV2 1 0 float16 [4 16 32 64 64] [8 16] bwd 2583113 1443680 1.79
GatherV2 2 0 float16 [4 16 32 64 64] [8 16] bwd 1550325 1038899 1.49
GatherV2 3 0 float16 [4 16 32 64 64] [8 16] bwd 1307337 633816 2.06
GatherV2 0 0 float16 [16 16 32 64 128] [16 32] bwd 51681692 43223100 1.20
GatherV2 1 0 float16 [16 16 32 64 128] [16 32] bwd 50925790 45607700 1.12
GatherV2 1 1 float16 [2 4 8 32 64] [2 8] bwd 287308 45692 6.29
GatherV2 2 2 float16 [2 4 8 32 64] [2 4] bwd 261031 32002 8.16
GatherV2 4 2 float16 [2 4 8 32 64] [2 4] bwd 321297 28944 11.10
GatherV2 1 1 float16 [4 16 32 64 64] [4 16] bwd 2953109 247983 11.91
GatherV2 2 2 float16 [4 16 32 64 64] [4 16] bwd 488315 66778 7.31
GatherV2 4 2 float16 [4 16 32 64 64] [4 16] bwd 645411 101625 6.35
float32
op_name dim batch dim dtype param size indices size direction ROCm MIOpen MIOpen vs ROCm
GatherV2 0 0 float32 [2 2 4 6 8] [16 16] bwd 412338 74560 5.53
GatherV2 1 0 float32 [2 2 4 6 8] [16 16] bwd 362641 81280 4.46
GatherV2 2 0 float32 [2 2 4 6 8] [16 16] bwd 355501 43715 8.13
GatherV2 3 0 float32 [2 2 4 6 8] [16 16] bwd 352977 43217 8.17
GatherV2 4 0 float32 [2 2 4 6 8] [16 16] bwd 441666 49848 8.86
GatherV2 0 0 float32 [2 4 8 32 64] [8 8] bwd 336881 79929 4.21
GatherV2 1 0 float32 [2 4 8 32 64] [8 8] bwd 388658 58702 6.62
GatherV2 2 0 float32 [2 4 8 32 64] [8 8] bwd 361425 56569 6.39
GatherV2 0 0 float32 [2 4 8 32 64] [16 64] bwd 2090839 905531 2.31
GatherV2 1 0 float32 [2 4 8 32 64] [16 64] bwd 1279412 551184 2.32
GatherV2 2 0 float32 [2 4 8 32 64] [16 64] bwd 1098788 657868 1.67
GatherV2 3 0 float32 [2 4 8 32 64] [16 64] bwd 696770 275432 2.53
GatherV2 0 0 float32 [4 16 32 64 64] [8 16] bwd 7004803 4010500 1.75
GatherV2 1 0 float32 [4 16 32 64 64] [8 16] bwd 2017621 1158890 1.74
GatherV2 2 0 float32 [4 16 32 64 64] [8 16] bwd 1344719 624252 2.15
GatherV2 3 0 float32 [4 16 32 64 64] [8 16] bwd 1471460 357056 4.12
GatherV2 0 0 float32 [16 16 32 64 128] [16 32] bwd 37657097 32223999 1.17
GatherV2 3 0 float32 [16 16 32 64 128] [16 32] bwd 15822371 11934400 1.33
GatherV2 1 1 float32 [2 2 4 6 8] [2 4] bwd 302031 30544 9.89
GatherV2 2 2 float32 [2 2 4 6 8] [2 2] bwd 266457 34598 7.70
GatherV2 4 2 float32 [2 2 4 6 8] [2 2] bwd 278411 32678 8.52
GatherV2 1 1 float32 [2 4 8 32 64] [2 8] bwd 269914 28997 9.31
GatherV2 2 2 float32 [2 4 8 32 64] [2 4] bwd 301791 40998 7.36
GatherV2 4 2 float32 [2 4 8 32 64] [2 4] bwd 245494 30153 8.14
GatherV2 1 1 float32 [4 16 32 64 64] [4 16] bwd 2256154 222577 10.14
GatherV2 2 2 float32 [4 16 32 64 64] [4 16] bwd 562390 66049 8.51
GatherV2 4 2 float32 [4 16 32 64 64] [4 16] bwd 912272 114960 7.94
bfloat16
op_name dim batch dim dtype param size indices size direction ROCm MIOpen MIOpen vs ROCm
GatherV2 0 0 bfloat16 [2 2 4 6 8] [16 16] bwd 594514 236658 2.51
GatherV2 1 0 bfloat16 [2 2 4 6 8] [16 16] bwd 472834 248161 1.91
GatherV2 2 0 bfloat16 [2 2 4 6 8] [16 16] bwd 366961 111147 3.30
GatherV2 3 0 bfloat16 [2 2 4 6 8] [16 16] bwd 420721 87947 4.78
GatherV2 4 0 bfloat16 [2 2 4 6 8] [16 16] bwd 493618 82115 6.01
GatherV2 0 0 bfloat16 [2 4 8 32 64] [8 8] bwd 416145 130294 3.19
GatherV2 1 0 bfloat16 [2 4 8 32 64] [8 8] bwd 445201 117511 3.79
GatherV2 2 0 bfloat16 [2 4 8 32 64] [8 8] bwd 430690 107502 4.01
GatherV2 3 0 bfloat16 [2 4 8 32 64] [8 8] bwd 358578 38897 9.22
GatherV2 0 0 bfloat16 [2 4 8 32 64] [16 64] bwd 3190058 1470390 2.17
GatherV2 1 0 bfloat16 [2 4 8 32 64] [16 64] bwd 1907398 1150150 1.66
GatherV2 4 0 bfloat16 [2 4 8 32 64] [16 64] bwd 2475236 365103 6.78
GatherV2 0 0 bfloat16 [4 16 32 64 64] [8 16] bwd 9823571 5387530 1.82
GatherV2 1 0 bfloat16 [4 16 32 64 64] [8 16] bwd 2671833 1452310 1.84
GatherV2 2 0 bfloat16 [4 16 32 64 64] [8 16] bwd 1596746 1028559 1.55
GatherV2 3 0 bfloat16 [4 16 32 64 64] [8 16] bwd 1274308 631826 2.02
GatherV2 0 0 bfloat16 [16 16 32 64 128] [16 32] bwd 52973827 43275600 1.22
GatherV2 1 0 bfloat16 [16 16 32 64 128] [16 32] bwd 52040938 45934000 1.13
GatherV2 1 1 bfloat16 [2 2 4 6 8] [2 4] bwd 305423 30651 9.96
GatherV2 2 2 bfloat16 [2 2 4 6 8] [2 2] bwd 264632 27735 9.54
GatherV2 4 2 bfloat16 [2 2 4 6 8] [2 2] bwd 265897 27095 9.81
GatherV2 1 1 bfloat16 [2 4 8 32 64] [2 8] bwd 298350 30971 9.63
GatherV2 4 2 bfloat16 [2 4 8 32 64] [2 4] bwd 263545 28339 9.30
GatherV2 1 1 bfloat16 [4 16 32 64 64] [4 16] bwd 3029180 251201 12.06
GatherV2 2 2 bfloat16 [4 16 32 64 64] [4 16] bwd 510302 66333 7.69
GatherV2 4 2 bfloat16 [4 16 32 64 64] [4 16] bwd 739456 101572 7.28

@cognaiger9 cognaiger9 self-assigned this Feb 11, 2025
@cognaiger9 cognaiger9 marked this pull request as ready for review February 11, 2025 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant