You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Integrate ccl async into TG Llama with reduce-scatter and all-gather. Where all-reduce is expected to be called, implement with back-to-back calls to reduce-scatter + all-gather
Implement all-reduce-async op interface (reuse implementation from older all-reduce V1 except replace underlying operations with async variants)
not a blocker for functional bringup but a blocker from how the model would ideally be implemented - keeps llama codebase unified
All-reduce-async from composite ops
reduce_scatter + all-gather
Integration: Stage 2
Implement all-=reduce
TBD
Migrate to ttnn.experimental.all_reduce_async from many all-reduce implementation
Prerequisites
Add all-gather subdevice worker core awareness #16637
Add all-gather, reduce scatter tests with worker subdevice not starting on core 0,0
Fix Worker <-> fabric EDM connection teardown race #16634
Add arbitrary shard-grid core range set support to addrgen in command processor (
ttnn/cpp/ttnn/operations/ccl/common/kernels/ccl_send_reader_two_input.cpp
) #16608Integration: Stage 1
Current plan is to have these mostly handled by @caixunshiren - work breakdown TBD after discussion between @avoraTT @kpaigwar @caixunshiren
Integration: Stage 2
ttnn.experimental.all_reduce_async
from many all-reduce implementationThe text was updated successfully, but these errors were encountered: