Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

Open
55 of 80 tasks
abhullar-tt opened this issue Sep 6, 2024 · 14 comments
Open
55 of 80 tasks

[Blackhole Ops] Fast Dispatch Post Commit Issue Master List #12349

abhullar-tt opened this issue Sep 6, 2024 · 14 comments

Comments

@abhullar-tt
Copy link
Contributor

abhullar-tt commented Sep 6, 2024

This is a master list of all tests that needed to be disabled to enable passing fast-dispatch-build-and-unit-tests.yaml on BH

Goal: Remove instances of skip_for_blackhole that are not skipped for WH

Matmul

Softmax

LayerNorm

Moreh Ops

  • misc/test_moreh_getitem.py mismatching
  • misc/test_moreh_sum.py::test_moreh_sum_integer mismatching
  • misc/test_moreh_layernorm.py::test_moreh_layernorm_backward mismatching
  • misc/test_moreh_nll_loss.py mismatching
  • misc/test_moreh_nll_loss_unreduced.py mismatching

Loss Ops

  • unit_testing/loss_ops/test_loss_mae.py mismatching
  • unit_testing/loss_ops/test_loss_mse.py mismatching

Downsample

  • unit_testing/misc/test_downsample.py mismatching

Multi Op

  • misc/test_create_qkv_heads.py::test_nlp_create_qkv_heads_test l1 buffer and cb crash
  • misc/test_scaled_dot_product_attention.py mismatching - only test failing is the one using hi-fidelity/float32 for calculations. may be similar to other float32 issues.
  • misc/test_scaled_dot_product_attention_decode.py
  • misc/test_nlp_create_qkv_heads_decode.py requires eth connected devices

Rotary Embeddings (not a TM, does a positional transformation using a matmul)

TMs

LLKs

Reduce

Misc.

Sweeps

@abhullar-tt abhullar-tt changed the title Blackhole Post Commit Op Issue Master List Blackhole Fast Dispatch Post Commit Op Issue Master List Sep 9, 2024
@abhullar-tt abhullar-tt changed the title Blackhole Fast Dispatch Post Commit Op Issue Master List [Blackhole Ops] Fast Dispatch Post Commit Issue Master List Sep 9, 2024
@ayerofieiev-tt
Copy link
Member

@mywoodstock @tarafdarTT @sjameelTT @bbradelTT @yugaoTT @razorback3 FYI

@smehtaTT
Copy link

FYI - CNN list for Blackhole tracked at - #11124

@ntarafdar
Copy link
Contributor

ntarafdar commented Sep 12, 2024

@abhullar-tt I updated the list including my individual issues tracking this.
@sjameelTT will look at two of the issues, (transpose and permute), I'll look at the other 3.

@yugaoTT
Copy link
Contributor

yugaoTT commented Sep 12, 2024

Lots of tests fail when input is float32 format, is it not supported or buggy on BH?

@abhullar-tt
Copy link
Contributor Author

Lots of tests fail when input is float32 format, is it not supported or buggy on BH?

@rtawfik01 @nvelickovicTT do you know if there are issues with float32?

@rtawfik01
Copy link
Contributor

@yugaoTT float32 has been tested in blackhole, can you provide the simplest test case that fails with float32? Preferably a single-tile case? Its more likely a metal api issue, and the fp32 flag is not passed to blakchole llks fully

@skhorasganiTT
Copy link
Contributor

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

  • ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)
  • ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

@abhullar-tt
Copy link
Contributor Author

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

  • ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)
  • ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

BH post commit will run everything that post commit on main runs, unless explicitly tagged with skip_for_blackhole so we shouldn't have to do any additional checks. Currently BH post commit is down because of some FW issues but once it is back up I can check the status of it

@uaydonat
Copy link
Contributor

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

  • ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)
  • ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

BH post commit will run everything that post commit on main runs, unless explicitly tagged with skip_for_blackhole so we shouldn't have to do any additional checks. Currently BH post commit is down because of some FW issues but once it is back up I can check the status of it

Thanks @abhullar-tt
We can use the feedback from Colman, Salar, etc. as a sanity check. After running the BH post commit, we should double check that these ops are on the list. If they are missing, then we should re-evaluate our testing coverage.

@abhullar-tt
Copy link
Contributor Author

These ops from Falcon7b seem to be missing, it would be good to double check if they're failing as well:

  • ttnn.experimental.nlp_create_qkv_heads_falcon7b (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_create_qkv_heads.py)
  • ttnn.experimental.nlp_concat_heads (tests/tt_eager/python_api_testing/unit_testing/misc/test_nlp_concat_heads.py)

BH post commit will run everything that post commit on main runs, unless explicitly tagged with skip_for_blackhole so we shouldn't have to do any additional checks. Currently BH post commit is down because of some FW issues but once it is back up I can check the status of it

Thanks @abhullar-tt We can use the feedback from Colman, Salar, etc. as a sanity check. After running the BH post commit, we should double check that these ops are on the list. If they are missing, then we should re-evaluate our testing coverage.

BH post commit is clean on main and the ops listed by Salar are covered. From Colman's list:

  • misc/test_scaled_dot_product_attention_decode.py is skipped on BH so it has been added to top list
  • misc/test_nlp_concat_heads_decode.py is included in post commit tests
  • tests/ttnn/unit_tests/operations/test_paged_update_cache.py --> not sure where this gets tested @cglagovichTT

@cglagovichTT
Copy link
Contributor

paged update cache is tested in this postcommit pipeline https://github.com/tenstorrent/tt-metal/actions/runs/11815965526/job/32918624681#logs

sjameelTT added a commit that referenced this issue Dec 10, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 11, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 11, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 11, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 12, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 12, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 16, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 17, 2024
#12550: re-enable some permute tests, disable the ones that aren't
working
sjameelTT added a commit that referenced this issue Dec 17, 2024
…d do a minor refactor of ttnn::permute (#15881)

### Ticket
#14790 add transpose wh sharded implementation when shard shape < height
dimension
#15165 add N-d permute with width dimension
#15589 correct permute dimensionality when less than 4D
#15750 remove the composite flag from permute
#12550 re-enable some permute tests for blackhole
#12349 re-enable working transpose tests for blackhole
#16066 disable test uniform as it's stochastic

### Problem description
This PR addresses several permute and transpose problems all at once

- Transpose WH sharded does not currently work when the shard shape is
less than the height
- Permute on greater than 4 dimensions does not work when moving width
around (for both tiled and RM)
- The Permute kernel when width doesn't change is single core
- Permute has an unclean API in which we have a composite flag that is
not generically applicable
- Permute on less than 4 dimensions gets an incorrect output shape in
cases where it's a no-op
- Permute tests are disabled for BH due to LLK issues
- Transpose tests are disabled for BH due to LLK issues

### What's changed
- Add transpose WH sharded implementation for when shard shape is less
than the height dim (outputs a block sharded output)
- Add an N-d permute kernel that works generically on any row major
input. We have to call a global init each loop of the compute kernel as
transpose sets some registers that aren't cleared (there's no
transpose_uninit). This results in bad pcc when there's more than one
loop. For GS/BH, even the global init doesn't solve the problem so the
test is disabled. For Tiled, we need 5D untilize/tilize. This increases
sweeps coverage for permute from **50%** to **86%**
- For the optimized case where Permute's width dimension is not
shuffled, make the kernel multicore
- Remove composite flag that is default set to to make permute
non-generic. This has caused forge models to have bad pcc as they were
not aware of this optional argument.
- Refactor ttnn::permute to add nop checks and correct shape
calculations
- Re-enable permute and transpose tests for blackhole

When replacing variants of transpose with this RM permute kernel, a lot
of tests on BH/GS failed, so I will do that in a follow-up to address.
The LLK issues are causing pains there. If we get N-d untilize/tilize
support and once the LLK issues are fixed, permute should have the
ability to be generic. The remaining issues for the pytorch 2.0 sweeps
after the untilize/tilize fix are the CB overflow on transpose wh, which
should be fixed out of the box when we replace the kernel that is used
(which I am not doing in this PR since it doesn't work for GS/BH atm).

### Checklist
- [x] Post commit CI passes
https://github.com/tenstorrent/tt-metal/actions/runs/12367177499/job/34547311782
(failing test is failing on main)
- [x] Blackhole Post commit (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12367175575
- [x] Model regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12357119737
- [x] Device performance regression CI testing passes (if applicable)
https://github.com/tenstorrent/tt-metal/actions/runs/12357115316
- [ ] **(For models and ops writers)** Full [new
models](https://github.com/tenstorrent/tt-metal/actions/workflows/full-new-models-suite.yaml)
tests passes
- [ ] New/Existing tests provide coverage for changes
@prajaramanTT prajaramanTT added this to the BHLD milestone Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests