Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No map_distributed_send_ref_holder method in NUserCollector #288

Closed
majosm opened this issue Mar 24, 2022 · 10 comments
Closed

No map_distributed_send_ref_holder method in NUserCollector #288

majosm opened this issue Mar 24, 2022 · 10 comments

Comments

@majosm
Copy link
Collaborator

majosm commented Mar 24, 2022

I ran into this error:

AttributeError: 'NUserCollector' object has no attribute 'map_distributed_send_ref_holder'

while working on inter-volume communication for inducer/grudge#236. Not sure if this method is supposed to exist, or if I'm doing something wrong? (Possibly both...)

cc @matthiasdiener

@majosm majosm changed the title No 'map_distributed_send_ref_holder' method in NUserCollector No map_distributed_send_ref_holder method in NUserCollector Mar 24, 2022
@inducer
Copy link
Owner

inducer commented Mar 24, 2022

We could support that (it'd be simple to add), but I'm puzzled on why it's needed. Could you post a full traceback?

@majosm
Copy link
Collaborator Author

majosm commented Mar 24, 2022

We could support that (it'd be simple to add), but I'm puzzled on why it's needed. Could you post a full traceback?

https://gist.github.com/majosm/038212d3c9dbd25776c1534611d6e1b0

I'm doing some wacky stuff with send stapling to try to make sure the sends actually have something that depends on them (I can't use the receives anymore since they don't map one-to-one). Guessing this might be related to that?

@kaushikcfd
Copy link
Collaborator

I think PytatoPyOpenCLArrayContext.freeze does not support DAGs with communication nodes.

As a side note, we should catch such things and have a clearer error message for such cases.

@majosm
Copy link
Collaborator Author

majosm commented Mar 24, 2022

I think PytatoPyOpenCLArrayContext.freeze does not support DAGs with communication nodes.

Not sure I'm understanding (yet). The test was already distributed before I started tinkering with the code being tested; it ran fine on the unmodified version of grudge. Is there something in particular I need to be doing in order to avoid having communication in the DAG before freeze?

@inducer
Copy link
Owner

inducer commented Mar 24, 2022

If you'd like to execute anything with communication, you need to go through compile (not freeze), because of this FIXME:

https://github.com/inducer/grudge/blob/c17b7d30416482aaa4a9b0597f91d38484d0a487/grudge/array_context.py#L259

@majosm
Copy link
Collaborator Author

majosm commented Mar 24, 2022

Well, I made a minor change to my code and the problem went away. Still don't really understand what the issue was. Prior to making the change I attempted to debug the issue, and I got to the point of determining that my receive nodes weren't in the final DAG for some reason. But I wasn't able to find out why. Once I get my changes cleaned up a little, I can post the working/not-working versions if anyone wants to take a closer look.

BTW, the test_func_comparison_mpi test in grudge doesn't currently compile. Should that be fixed? (It didn't affect the outcome of the test for me.)

@inducer
Copy link
Owner

inducer commented Mar 25, 2022

BTW, the test_func_comparison_mpi test in grudge doesn't currently compile. Should that be fixed? (It didn't affect the outcome of the test for me.)

Yikes, good catch. That implies that it's not communicating, which is weird. (it should be... that's the whole point of the test!) Looking into it now.

@majosm
Copy link
Collaborator Author

majosm commented Mar 25, 2022

Here is the code that was triggering this issue: https://github.com/inducer/grudge/blob/823f7fbbc7602c0dbabd1a7a1b2e2d2bcf4eeec6/grudge/trace_pair.py#L722-L726

I explain what I'm attempting to accomplish with the zero in inducer/grudge#243. Not sure why adding it to the interior value of the trace pair causes problems but adding it to the exterior doesn't. Any ideas?

@inducer
Copy link
Owner

inducer commented Mar 25, 2022

inducer/grudge#244

@majosm
Copy link
Collaborator Author

majosm commented Mar 25, 2022

inducer/grudge#244

This also fixed the issue I was having here. If I'm understanding correctly, I think it's because the sends were a dependency of both int/ext, but the receives are only a dependency of ext. So the test accessing int instead of ext meant that the receives would never get executed. (And "fixing" it such that the sends are also only dependencies of ext meant that neither of the sends/recvs were getting executed in that test.)

@majosm majosm closed this as completed Mar 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants