[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

abhullar-tt · 2024-09-06T23:19:10Z

This is a master list of all tests that needed to be disabled to enable passing fast-dispatch-build-and-unit-tests.yaml on BH

Goal: Remove instances of skip_for_blackhole that are not skipped for WH

Matmul

Softmax

misc/test_softmax_sharded.py::test_scale_mask_softmax_rm mismatch misc/test_softmax_sharded.py mismatch #14350
misc/test_single_core_fused_ops.py::test_softmax mismatch misc/test_single_core_fused_ops.py::test_softmax mismatch #14351

LayerNorm

misc/test_distributed_layernorm_post_allgather.py mismatching misc/test_distributed_layernorm_post_allgather.py mismatching #14354
misc/test_single_core_fused_ops.py::test_layernorm mismatching misc/test_single_core_fused_ops.py::test_layernorm mismatching #14355

Moreh Ops

misc/test_moreh_getitem.py mismatching
misc/test_moreh_sum.py::test_moreh_sum_integer mismatching
misc/test_moreh_layernorm.py::test_moreh_layernorm_backward mismatching
misc/test_moreh_nll_loss.py mismatching
misc/test_moreh_nll_loss_unreduced.py mismatching

Loss Ops

unit_testing/loss_ops/test_loss_mae.py mismatching
unit_testing/loss_ops/test_loss_mse.py mismatching

Downsample

unit_testing/misc/test_downsample.py mismatching

Multi Op

misc/test_create_qkv_heads.py::test_nlp_create_qkv_heads_test l1 buffer and cb crash
misc/test_scaled_dot_product_attention.py mismatching - only test failing is the one using hi-fidelity/float32 for calculations. may be similar to other float32 issues.
misc/test_scaled_dot_product_attention_decode.py
misc/test_nlp_create_qkv_heads_decode.py requires eth connected devices

Rotary Embeddings (not a TM, does a positional transformation using a matmul)

TMs

LLKs

Reduce

Reduce in test_sharded.py PCC failure for BH Reduce in test_sharded.py PCC failure for BH #15106

Misc.

misc/test_update_cache.py
unit_testing/test_prod_nc.py mismatch
misc/test_binary_eq_int.py mismatch #12574
misc/test_sum.py mismatch
misc/test_unary_ops_ttnn.py::test_unary_pow_ttnn #12577

Sweeps

The text was updated successfully, but these errors were encountered:

ayerofieiev-tt · 2024-09-11T04:56:45Z

@mywoodstock @tarafdarTT @sjameelTT @bbradelTT @yugaoTT @razorback3 FYI

smehtaTT · 2024-09-11T15:28:05Z

FYI - CNN list for Blackhole tracked at - #11124

ntarafdar · 2024-09-12T04:00:09Z

@abhullar-tt I updated the list including my individual issues tracking this.
@sjameelTT will look at two of the issues, (transpose and permute), I'll look at the other 3.

yugaoTT · 2024-09-12T19:01:22Z

Lots of tests fail when input is float32 format, is it not supported or buggy on BH?

abhullar-tt · 2024-09-12T20:05:03Z

Lots of tests fail when input is float32 format, is it not supported or buggy on BH?

@rtawfik01 @nvelickovicTT do you know if there are issues with float32?

rtawfik01 · 2024-09-12T20:31:55Z

@yugaoTT float32 has been tested in blackhole, can you provide the simplest test case that fails with float32? Preferably a single-tile case? Its more likely a metal api issue, and the fp32 flag is not passed to blakchole llks fully

skhorasganiTT · 2024-11-04T16:07:21Z

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)
ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

abhullar-tt · 2024-11-04T17:07:10Z

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)

ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

BH post commit will run everything that post commit on main runs, unless explicitly tagged with skip_for_blackhole so we shouldn't have to do any additional checks. Currently BH post commit is down because of some FW issues but once it is back up I can check the status of it

uaydonat · 2024-11-12T17:59:43Z

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)

ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

BH post commit will run everything that post commit on main runs, unless explicitly tagged with skip_for_blackhole so we shouldn't have to do any additional checks. Currently BH post commit is down because of some FW issues but once it is back up I can check the status of it

Thanks @abhullar-tt
We can use the feedback from Colman, Salar, etc. as a sanity check. After running the BH post commit, we should double check that these ops are on the list. If they are missing, then we should re-evaluate our testing coverage.

abhullar-tt · 2024-11-12T20:41:27Z

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)

ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

BH post commit will run everything that post commit on main runs, unless explicitly tagged with skip_for_blackhole so we shouldn't have to do any additional checks. Currently BH post commit is down because of some FW issues but once it is back up I can check the status of it

Thanks @abhullar-tt We can use the feedback from Colman, Salar, etc. as a sanity check. After running the BH post commit, we should double check that these ops are on the list. If they are missing, then we should re-evaluate our testing coverage.

BH post commit is clean on main and the ops listed by Salar are covered. From Colman's list:

misc/test_scaled_dot_product_attention_decode.py is skipped on BH so it has been added to top list
misc/test_nlp_concat_heads_decode.py is included in post commit tests
tests/ttnn/unit_tests/operations/test_paged_update_cache.py --> not sure where this gets tested @cglagovichTT

cglagovichTT · 2024-11-13T13:44:00Z

paged update cache is tested in this postcommit pipeline https://github.com/tenstorrent/tt-metal/actions/runs/11815965526/job/32918624681#logs

#12550: re-enable some permute tests, disable the ones that aren't working

…d do a minor refactor of ttnn::permute (#15881) ### Ticket #14790 add transpose wh sharded implementation when shard shape < height dimension #15165 add N-d permute with width dimension #15589 correct permute dimensionality when less than 4D #15750 remove the composite flag from permute #12550 re-enable some permute tests for blackhole #12349 re-enable working transpose tests for blackhole #16066 disable test uniform as it's stochastic ### Problem description This PR addresses several permute and transpose problems all at once - Transpose WH sharded does not currently work when the shard shape is less than the height - Permute on greater than 4 dimensions does not work when moving width around (for both tiled and RM) - The Permute kernel when width doesn't change is single core - Permute has an unclean API in which we have a composite flag that is not generically applicable - Permute on less than 4 dimensions gets an incorrect output shape in cases where it's a no-op - Permute tests are disabled for BH due to LLK issues - Transpose tests are disabled for BH due to LLK issues ### What's changed - Add transpose WH sharded implementation for when shard shape is less than the height dim (outputs a block sharded output) - Add an N-d permute kernel that works generically on any row major input. We have to call a global init each loop of the compute kernel as transpose sets some registers that aren't cleared (there's no transpose_uninit). This results in bad pcc when there's more than one loop. For GS/BH, even the global init doesn't solve the problem so the test is disabled. For Tiled, we need 5D untilize/tilize. This increases sweeps coverage for permute from **50%** to **86%** - For the optimized case where Permute's width dimension is not shuffled, make the kernel multicore - Remove composite flag that is default set to to make permute non-generic. This has caused forge models to have bad pcc as they were not aware of this optional argument. - Refactor ttnn::permute to add nop checks and correct shape calculations - Re-enable permute and transpose tests for blackhole When replacing variants of transpose with this RM permute kernel, a lot of tests on BH/GS failed, so I will do that in a follow-up to address. The LLK issues are causing pains there. If we get N-d untilize/tilize support and once the LLK issues are fixed, permute should have the ability to be generic. The remaining issues for the pytorch 2.0 sweeps after the untilize/tilize fix are the CB overflow on transpose wh, which should be fixed out of the box when we replace the kernel that is used (which I am not doing in this PR since it doesn't work for GS/BH atm). ### Checklist - [x] Post commit CI passes https://github.com/tenstorrent/tt-metal/actions/runs/12367177499/job/34547311782 (failing test is failing on main) - [x] Blackhole Post commit (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12367175575 - [x] Model regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12357119737 - [x] Device performance regression CI testing passes (if applicable) https://github.com/tenstorrent/tt-metal/actions/runs/12357115316 - [ ] **(For models and ops writers)** Full [new models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml) tests passes - [ ] New/Existing tests provide coverage for changes

abhullar-tt added blackhole P1 labels Sep 6, 2024

abhullar-tt changed the title ~~Blackhole Post Commit Op Issue Master List~~ Blackhole Fast Dispatch Post Commit Op Issue Master List Sep 9, 2024

abhullar-tt changed the title ~~Blackhole Fast Dispatch Post Commit Op Issue Master List~~ [Blackhole Ops] Fast Dispatch Post Commit Issue Master List Sep 9, 2024

abhullar-tt mentioned this issue Sep 10, 2024

#10226: Added or refactored BH SFPU functions #12379

Merged

4 tasks

ayerofieiev-tt added the ttnn label Sep 11, 2024

smehtaTT assigned jvasilje Sep 11, 2024

bbradelTT mentioned this issue Sep 11, 2024

BH linear test PCC failure #12539

Closed

abhullar-tt mentioned this issue Sep 16, 2024

[Blackhole bringup]: Some softmax variants are mismatching #10853

Closed

abhullar-tt added a commit that referenced this issue Sep 16, 2024

#12349: Skipping failing tests on BH

d4058da

abhullar-tt mentioned this issue Sep 17, 2024

#11919: Enable fast dispatch build-and-unit-tests #12334

Merged

2 tasks

abhullar-tt added a commit that referenced this issue Sep 18, 2024

#12349: Skipping failing tests on BH

be0e948

ttmchiou pushed a commit that referenced this issue Sep 18, 2024

#12349: Skipping failing tests on BH

c038ece

ttmchiou pushed a commit that referenced this issue Sep 19, 2024

#12349: Skipping failing tests on BH

4d8b28b

ttmchiou pushed a commit that referenced this issue Sep 19, 2024

#12349: Skipping failing tests on BH

66dbaec

ntarafdar mentioned this issue Sep 24, 2024

Reshape FD Blackhole Mismatch #12548

Closed

abhullar-tt added CNNs op_cat: mm op_cat: eltwise op_cat: TM labels Oct 4, 2024

eyonland mentioned this issue Oct 15, 2024

Eltwise Master Tracking #13795

Open

abhullar-tt added a commit that referenced this issue Oct 18, 2024

#12349: Disable mismatching transpose test on BH

6423321

abhullar-tt mentioned this issue Oct 18, 2024

#12349: Disable mismatching transpose test on BH #13966

Merged

5 tasks

abhullar-tt added a commit that referenced this issue Oct 24, 2024

#12349: Enabling pytests on BH that are now working

0cfed02

zzigler-tt pinned this issue Nov 1, 2024

abhullar-tt mentioned this issue Nov 1, 2024

#14548: Update untilize with unpadding test params for BH #14587

Merged

1 task

KalaivaniMCW unpinned this issue Nov 4, 2024

abhullar-tt added a commit that referenced this issue Nov 5, 2024

#12349: Enable working tests on BH

d7308b7

abhullar-tt added a commit that referenced this issue Nov 5, 2024

#12349: Enable working tests on BH

71eaf9c

abhullar-tt added P0 and removed P1 labels Nov 6, 2024

This was referenced Nov 7, 2024

Tilize block not working for less than 32 rows #14609

Closed

Using tiny tiles bfp8 format output, the matmul gives first row all 0s (16 0s) #14305

Open

djordje-tt pushed a commit that referenced this issue Nov 8, 2024

#12349: Enable working tests on BH

8b6ed10

ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this issue Nov 12, 2024

tenstorrent#12349: Disable mismatching transpose test on BH

d61896e

ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this issue Nov 12, 2024

tenstorrent#12349: Enabling pytests on BH that are now working

d6d4408

ct-clmsn pushed a commit to ct-clmsn/tt-metal that referenced this issue Nov 12, 2024

tenstorrent#12349: Enable working tests on BH

d565501

sjameelTT added a commit that referenced this issue Dec 10, 2024

#12349: add back tests that now work on blackhole

98d23b0

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT mentioned this issue Dec 10, 2024

Add transpose WH sharded, generalize row major permute when N > 4, and do a minor refactor of ttnn::permute #15881

Merged

6 tasks

sjameelTT added a commit that referenced this issue Dec 11, 2024

#12349: add back tests that now work on blackhole

9b20107

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT added a commit that referenced this issue Dec 11, 2024

#12349: add back tests that now work on blackhole

2c2a062

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT added a commit that referenced this issue Dec 11, 2024

#12349: add back tests that now work on blackhole

db13e30

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT added a commit that referenced this issue Dec 12, 2024

#12349: add back tests that now work on blackhole

b21db5a

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT added a commit that referenced this issue Dec 12, 2024

#12349: add back tests that now work on blackhole

562cc5f

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT added a commit that referenced this issue Dec 16, 2024

#12349: add back tests that now work on blackhole

a3c2bc7

#12550: re-enable some permute tests, disable the ones that aren't working

sjameelTT added a commit that referenced this issue Dec 17, 2024

#12349: add back tests that now work on blackhole

04c8e84

#12550: re-enable some permute tests, disable the ones that aren't working

ntarafdar mentioned this issue Dec 17, 2024

Reenable Working Blackhole Tests #16075

Closed

6 tasks

prajaramanTT added this to the BHLD milestone Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

abhullar-tt commented Sep 6, 2024 •

edited by prajaramanTT

Loading

ayerofieiev-tt commented Sep 11, 2024

smehtaTT commented Sep 11, 2024

ntarafdar commented Sep 12, 2024 •

edited

Loading

yugaoTT commented Sep 12, 2024

abhullar-tt commented Sep 12, 2024

rtawfik01 commented Sep 12, 2024

skhorasganiTT commented Nov 4, 2024

abhullar-tt commented Nov 4, 2024

uaydonat commented Nov 12, 2024

abhullar-tt commented Nov 12, 2024

cglagovichTT commented Nov 13, 2024

[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

Comments

abhullar-tt commented Sep 6, 2024 • edited by prajaramanTT Loading

ayerofieiev-tt commented Sep 11, 2024

smehtaTT commented Sep 11, 2024

ntarafdar commented Sep 12, 2024 • edited Loading

yugaoTT commented Sep 12, 2024

abhullar-tt commented Sep 12, 2024

rtawfik01 commented Sep 12, 2024

skhorasganiTT commented Nov 4, 2024

abhullar-tt commented Nov 4, 2024

uaydonat commented Nov 12, 2024

abhullar-tt commented Nov 12, 2024

cglagovichTT commented Nov 13, 2024

abhullar-tt commented Sep 6, 2024 •

edited by prajaramanTT

Loading

ntarafdar commented Sep 12, 2024 •

edited

Loading