Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#16171: Preload kernels before receiving go message #16680

Merged
merged 1 commit into from
Jan 16, 2025

Conversation

jbaumanTT
Copy link
Contributor

@jbaumanTT jbaumanTT commented Jan 13, 2025

Ticket

#16171

Problem description

Currently it takes around 750 ns (ideally) between a core sending its acknowledgement that it's finished a kernel and receiving the GO message for the next kernel. That time could better be spent preparing to load the next kernel.

What's changed

Add a flag that lets brisc.cc and erisc.cc start loading kernels before receiving a go message. Fast dispatch ensures that that the flag will be set only after all necessary program data is sent to the core.

This allows preparation for the following kernel (including loading NCRISC IRAM, setting up CBs, and initializing local memory) to happen in parallel with the round-trip to the dispatcher_s to sync up with the other kernels and ensure that they're all launched at the same time.

Checklist

  • Post commit CI passes
  • Blackhole Post commit (if applicable)
  • Model regression CI testing passes (if applicable)
  • Device performance regression CI testing passes (if applicable)
  • (For models and ops writers) Full new models tests passes
  • New/Existing tests provide coverage for changes

tt_metal/hw/inc/dev_msgs.h Outdated Show resolved Hide resolved
@jbaumanTT jbaumanTT force-pushed the jbauman/preloadkernel3 branch from 5d0bc7b to cbb748a Compare January 14, 2025 23:10
@jbaumanTT jbaumanTT force-pushed the jbauman/preloadkernel3 branch 2 times, most recently from 5863780 to c98e90b Compare January 15, 2025 18:12
Add a flag that lets brisc.cc start loading kernels before receiving a go
message. Fast dispatch ensures that that the flag will be set only after all
necessary program data is sent to the core.

This allows preparation for the following kernel (including loading NCRISC
IRAM, setting up CBs, and initializing local memory) to happen in parallel with
the round-trip to the dispatcher_s to sync up with the other kernels and ensure
that they're all launched at the same time.
@jbaumanTT jbaumanTT force-pushed the jbauman/preloadkernel3 branch from c98e90b to 7d95eed Compare January 16, 2025 18:14
@jbaumanTT jbaumanTT requested a review from pgkeller January 16, 2025 19:55
@jbaumanTT jbaumanTT merged commit 6f416e3 into main Jan 16, 2025
11 of 12 checks passed
@jbaumanTT jbaumanTT deleted the jbauman/preloadkernel3 branch January 16, 2025 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants