Legion Prof renders implicit top-level tasks on "I/O" processor #1828

elliottslaughter · 2025-01-30T20:12:07Z

As a consequence of the design decision made in #1770 (comment), Legion currently synthesizes a new I/O processor for each implicit top-level task. The problem with this approach is that it is potentially confusing to users: after all, the original program may not use I/O processors at all, and users may not even be aware of what I/O processors are, let alone the representation that we use to implement them.

@manopapad asked me to brainstorm some potential solutions to this issue so that users can understand more easily what their program is doing. Here's what I can think of, but maybe @lightsighter has other ideas:

Fix Realm to synthesize the external processors correctly, so that we can remove this entire hack from Legion.
Fix Legion to synthesize a new processor of the type the implicit task is actually running on. I believe the issue reported in legion_prof_rs: assertion failed: running_stop <= start #1770 was primarily because we had previously picked an existing processor at random to stick our implicit top-level task onto. This caused issues because the profiler expects some invariants to be maintained (on any non-I/O processors, only one task may run at a time). However, I think this should not be an issue as long as we synthesize a new processor of the correct type. This would at least show the processor to the user as a "CPU" processor or similar.
Create a fake processor type in Legion called "top-level" that is never actually instantiated at runtime, but exists only to communicate to the profiler that this is really a top-level task. The profiler would need to be updated to learn about the new kind, but would not otherwise require updates.
Create a mechanism in Legion to attach custom names to processors of any kind, which Legion can then use to label processors such as those I/O processors used for implicit top-level tasks. The profiler would need to learn how to display these custom names and we'd need to decide exactly how to do that.

lightsighter · 2025-01-31T00:42:23Z

Some thoughts on the different approaches:

I think I've given up on trying to get Realm to do the "external thread drafting" to support this completely (see External Thread Drafting #716). We could get Realm to synthesize just the IDs, but that seems not useful.
The details about why it didn't work is correct. The top-level task pretending to be running on the processor and other tasks actually running on the processor would appear to be running concurrently, which is unsound from the profiler's perspective. In order for the synthesized new processor to work, you'd have to make a fresh one for every new top-level task. I'll point out that this is also not accurate as it can be the case that multiple implicit top-level tasks are often sharing the same external threads when running so appear to put them on different processors is not accurate either.
Seems like a reasonable option to me as it would give the best indication of what was actually happening. I was just too lazy to do it before since it required lots of changes to the profiler. We'd probably want one of these for each node. I also don't think we need to make a fake processor ID either. Inside Legion we already have all the implicit top-level task profiling isolated anyway so we'll just log it out special without needing to pretend that it's custom anymore as the profiler could handle it now.
Not a big fan of this one. It's just moving the problem around a little bit from option 2 with some slightly nicer names.

I don't think I have any better ideas. Option 3 is my preferred choice.

elliottslaughter · 2025-01-31T17:37:11Z

Option 3 is fine and should be easy as long as you keep synthesizing the fake processor IDs. ProcKind is just an enum and it's pretty trivial to add new entries. There will be a handful of places that require corresponding updates, but really not that many.

If you want to also stop synthesizing processor IDs entirely, that's a larger issue because we use processor IDs for a variety of purposes within the profiler, making that a much more challenging change. But I think it's also needless: a new processor kind does not imply abandoning synthesizing processor IDs.

lightsighter · 2025-02-01T18:35:26Z

I'm not entirely sure that they way I am synthesizing processor IDs is sound, and it might also conflict with the way Realm adds processor IDs if/when the machine is dynamic (e.g. processors are added). I would prefer to get rid of synthesizing processor IDs in the future, but we can keep them for now if we have to.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Legion Prof renders implicit top-level tasks on "I/O" processor #1828

Legion Prof renders implicit top-level tasks on "I/O" processor #1828

elliottslaughter commented Jan 30, 2025

lightsighter commented Jan 31, 2025

elliottslaughter commented Jan 31, 2025

lightsighter commented Feb 1, 2025

Legion Prof renders implicit top-level tasks on "I/O" processor #1828

Legion Prof renders implicit top-level tasks on "I/O" processor #1828

Comments

elliottslaughter commented Jan 30, 2025

lightsighter commented Jan 31, 2025

elliottslaughter commented Jan 31, 2025

lightsighter commented Feb 1, 2025