Inconsistent performance @ production #498

MTCam · 2021-08-24T19:22:45Z

This is an issue-in-the-making. First, the automated timings on Lassen are catching some inconsistent results:

Note that after about last Friday - the timing results begin to vary quite a bit between runs (not normal for this code).

Update: The issue seems to have been resolved by switching to the batch queue, suggesting that the problem was bad nodes or bad devices in the debug queue.

The issue does not appear to be connected to any particular Lassen node; spikes were observed on both lassen34 and lassen36 from the debug queue.

program capture (@inducer)
- Added stdout capture to timing data
- arraycontext branch to capture the pytato program
- TODO: capture pytato program during timing runs
code history checks
- MIRGE-Com level development did not seem to cause this: observed the spikes with historical versions of MIRGE-Com
- TODO: Sub-packages development still needs to be checked
TODO: small example to see if it can be quickly reproduced (turn-around time is about 30 minutes for the nozzle-proper).

inducer · 2021-08-24T21:56:17Z

If this can't be explained by variations in software or node (and it's quite plausible that it might not), this might be due to some nondeterminism in our code transformation pipeline. This might happen for example if the expression graph gets traversed in one order one time, leading to a "fast" timing, and another way the next, leading to a "slow timing".

To confirm or refute this theory, it would be great if we could collect the generated Loopy and OpenCL code for "fast" and "slow" runs. When discussing this with @MTCam, this patch against https://github.com/inducer/arraycontext/ permits the former.

I'm mindful that this creates issues for gathering performance data, e.g. h/p scaling, for the review.

cc @matthiasdiener

@kaushikcfd Any thoughts on what might be at play here?

kaushikcfd · 2021-08-24T22:33:30Z

I think that's not quite the correct place to print the IR. We should print the IR (or generated code) after transforming the t-unit. Probably over here https://github.com/inducer/arraycontext/blob/82117c73c24cd611f038eb663ccca21f1f679421/arraycontext/impl/pytato/compile.py#L244 is the right spot.

MTCam · 2021-08-24T23:07:57Z

Any objections to y1-production merging parallel-lazy @matthiasdiener? If not, I may shift y1-production so that it can run lazy out-of-the-box, and will use the version of arraycontext from here. That way we can get the program dumps automatically.

I guess dumping the program should become an option so we can enable it when running timings?

matthiasdiener · 2021-08-24T23:09:01Z

Any objections to y1-production merging parallel-lazy @matthiasdiener? If not, I may shift y1-production so that it can run lazy out-of-the-box, and will use the version of arraycontext from here. That way we can get the program dumps automatically.

No objection from me.

kaushikcfd · 2021-08-25T00:04:00Z

On generating the code locally for 10 different PYTHONHASHSEEDs I didn't observe any discrepancy in the generated code.

One way to see if we want to attribute this to our code-gen framework could be if we observe the performance difference even after fixing a PYTHONHASHSEED.

inducer · 2021-08-27T18:22:02Z

@MTCam pointed out in the dev meeting today that this may be mainly a function of which node (in the lassen debug queue) he runs on. IIRC, he said he has not yet observed the slow runs when running on the "production" (?) queue.

MTCam · 2021-08-27T18:31:09Z

@MTCam pointed out in the dev meeting today that this may be mainly a function of which node (in the lassen debug queue) he runs on. IIRC, he said he has not yet observed the slow runs when running on the "production" (?) queue.

Right. I have not seen anything since switching out of the debug queue and into the batch queue. The sampling frequency has been turned down to 2xdaily. This may just "go away.

MTCam · 2021-08-30T12:00:12Z

The plot in the description has been updated with the current state. Looks like this may have been a system issue.

inducer · 2021-08-30T14:27:48Z

Close for now?

This comment has been minimized.

Sign in to view

MTCam closed this as completed Aug 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent performance @ production #498

Inconsistent performance @ production #498

MTCam commented Aug 24, 2021 •

edited

Loading

inducer commented Aug 24, 2021

kaushikcfd commented Aug 24, 2021

MTCam commented Aug 24, 2021

matthiasdiener commented Aug 24, 2021

kaushikcfd commented Aug 25, 2021

This comment has been minimized.

This comment has been minimized.

inducer commented Aug 27, 2021

MTCam commented Aug 27, 2021

MTCam commented Aug 30, 2021

inducer commented Aug 30, 2021

Inconsistent performance @ production #498

Inconsistent performance @ production #498

Comments

MTCam commented Aug 24, 2021 • edited Loading

inducer commented Aug 24, 2021

kaushikcfd commented Aug 24, 2021

MTCam commented Aug 24, 2021

matthiasdiener commented Aug 24, 2021

kaushikcfd commented Aug 25, 2021

This comment has been minimized.

This comment has been minimized.

inducer commented Aug 27, 2021

MTCam commented Aug 27, 2021

MTCam commented Aug 30, 2021

inducer commented Aug 30, 2021

MTCam commented Aug 24, 2021 •

edited

Loading