Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Optimize EDM-fabric flow-control protocols (#17495)
Essentially a rewrite for the majority of the fabric EDM. Worker -> EDM connection teardown bug fix Updates flow control for * Worker -> EDM * EDM Sender Channel * EDM Receiver Channel The fourth piece of the data-path that is not updated is the sender -> receiver flow control over ethernet. This is a future change and is tracked through this issue: #17430 # Worker -> EDM Flow Connection Teardown Fix fix worker <-> edm fabric connection state transitions to prevent race connection state transitions were previously an invalid design: worker: 0 (close) -> 1 (open) worker: 1 (open) -> 0 (close) This design was inadequate because worker was able to open and close a connection without EDM fabric being in the loop. This could lead to the following race condition in workloads with few packets per connection: * edm checks sender channel * worker opens conn * worker sends payload * worker tears down connection * edm checks channel (misses teardown request) Additionally, this change fixes a bug with worker <-> EDM connection teardown by adding a new discrete state: `close_connection_request` New connection management is as follows: * worker (open connection): update connection semaphore from 0 (unused connection) -> 1 (open connection) * worker: send traffic * worker (close connection): update connection semaphore from 1 (open) -> 2 (teardown request) * worker wait for ack on local teardown address (`worker_teardown_addr` in `WorkerToFabricEdmSender::close()`) * EDM acknowledge connection close by updating `worker_teardown` semaphore in worker L1 # Worker -> EDM Flow Control Protocol: The flow control protocol is rd/wr ptr based and is implemented as follows (from the worker's perspective): The adapter has a local write pointer (wrptr) which is used to track the next buffer slot to write to. The adapter also has a local memory slot that holds the remote read pointer (rdptr) of the EDM. The adapter uses the difference between these two pointers (where rdptr trails wrptr) to determine if the EDM has space to accept a new packet. As the adapter writes into the EDM, it updates the local wrptr. As the EDM reads from its local L1 channel buffer, it will notify the worker/adapter (here) by updating the worker remote_rdptr to carry the value of the EDM rdptr. ## EDM <-> EDM Channel Flow Control The flow control protocol between EDM channels is built on a rd/wr ptr based protocol where pointers are to buffer slots within the channel (as opposed so something else like byte or word offset). Ptrs are free to advance independently from each other as long as there is no overflow or underflow. ### Sender Channel Flow Control Both sender channels share the same flow control view into the receiver channel. This is because both channels write to the same receiver channel. * wrptr: * points to next buffer slot to write to into the remote (over Ethernet) receiver channel. * leads other pointers * writer updates for every new packet * `has_data_to_send(): local_wrptr != remote_sender_wrptr` * ackptr * trails `wrptr` * advances as the channel receives acknowledgements from the receiver * as this advances, the sender channel can notify the upstream worker of additional space in sender channel buffer * completion_ptr: * trails `local_wrptr` * "rdptr" from remote sender's perspective * advances as packets completed by receiver * as this advances, the sender channel can write additional packets to the receiver at this slot ### Receiver Channel Flow Control * ackptr/rdptr: * leads all pointers * indicates the next buffer slot we expect data to arrive (from remote sender) at * advances as packets are received (and acked) * make sure not to overlap completion pointer * wr_sent_ptr: * trails `ackptr` * indicates the buffer slot currently being processed, written out * advances after all forwding writes (to noc or downstream EDM) are initiated * wr_flush_ptr: * trails `wr_sent_ptr` * advances as writes are flushed * completion_ptr: * trails `wr_flush_ptr` * indicates the next receiver buffer slot in the receiver channel to send completion acks for
- Loading branch information