We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running this benchmark based on wave-op-mpi.py on 1c44c4b with the command
wave-op-mpi.py
PYTHONHASHSEED=17 PYOPENCL_TEST=port:nvid setarch -R numactl -C 2 -m 0 nvprof -f -o yoink.nvvp python -O wave-op-mpi.py --dim=3 --order=4
on dunkel gives me the following profile in Nvidia's visual profiler:
dunkel
There are at least two things wrong here (both circled):
cuLaunchKernel
Curiously, there seem to be periods that don't suffer from this:
If we could fix these two types of stalls, I suspect our performance story would look quite a bit different.
cc @matthiasdiener @lukeolson
Other versions in use, for reproducibility:
The text was updated successfully, but these errors were encountered:
We could try using https://github.com/conda-forge/lttng-ust-feedstock for tracing. POCL already supports that tracing using LTTng: pocl/pocl@ef737d3
Sorry, something went wrong.
I faced a similar issue whenever I was using almost all of the device's global memory:
(Notice the two unexplainable voids in the profile)
On moving to a slightly coarser mesh, the phenomena wasn't seen.
No branches or pull requests
Running this benchmark based on
wave-op-mpi.py
on 1c44c4b with the commandon

dunkel
gives me the following profile in Nvidia's visual profiler:There are at least two things wrong here (both circled):
cuLaunchKernel
seems to take a very long time. Why?Curiously, there seem to be periods that don't suffer from this:

If we could fix these two types of stalls, I suspect our performance story would look quite a bit different.
cc @matthiasdiener @lukeolson
Other versions in use, for reproducibility:
The text was updated successfully, but these errors were encountered: