-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#0: Hoist SubDeviceManager/Lock-Step Allocator to MeshDevice #16878
base: main
Are you sure you want to change the base?
Conversation
1f34d17
to
92a2dd9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-Tidy
found issue(s) with the introduced code (1/1)
92a2dd9
to
d91d8e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high level questions for now. In general, I think it is worth exploring a better "allocator" abstraction that is independent of details of mesh / single device / sub device.
5968a64
to
2f3c743
Compare
2c4783e
to
89a2fd9
Compare
89a2fd9
to
a04ce94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sub-Device related changes look okay to me. Cleanup of how we want to expose these apis/interfaces instead of having a bunch of wrapping fns is a different discussion/issue from this pr.
|
||
// Helper function to verify all devices in the MeshDevice have the same value | ||
template <typename F> | ||
void validate_devices_return_same_value( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the intention, can you make validate_devices_return_same_value
validate and return the value? You wouldn't have to repeat the lines like return reference_device()->num_hw_cqs();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works fine for return values that are by-value but there were some issues when we defer to reference device method that needs to return const ref.
@@ -58,6 +59,9 @@ class SubDeviceManagerTracker { | |||
// default case to not affect performance | |||
SubDeviceManagerId get_default_sub_device_manager_id() const; | |||
|
|||
std::optional<DeviceAddr> lowest_occupied_compute_l1_address( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's aim to separate allocator from subdevices, and supply it externally instead... This method shouldn't live in the "subdevice manager tracker" abstraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's aim to separate allocator from subdevices, and supply it externally instead...
This is an involved change that is outside the scope of this PR. Currently SubDeviceManager fully owns allocator and SubDeviceManagerTracker
owns multiple SubDeviceManager
objects. But some entity needs to query across allocators. This was previously done in Device
and now it's in SubDeviceManagerTracker
. This is more appropriate than what living on Device
because otherwise we'd duplicate the implementation between Device/MeshDevice.
Ticket
Link to Github Issue
Problem description
As part of the TT-Mesh scope of work, we want to natively virtualize the devices in the mesh. Part of this virtualization process involves deferring to a single set of allocators (global, per-subdevice) at the MeshDevice level, instead of issuing repeated allocations on each of the per-device allocators.
What's changed
SubDeviceManagerTracker
to the MeshDevice level.Checklist