-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyOpenCL target: Overflow large argument counts into SVM struct #642
Conversation
ae9eb7c
to
054d05a
Compare
With the following passes for me:
Let me know if you can reproduce that. |
b643561
to
a9b2746
Compare
I confirmed that this PR works on POCL-pthreads and Nvidia CL when also using inducer/pyopencl#452 (and pocl/pocl#1069 in the POCL-pthreads case). With POCL-cuda and pocl/pocl#1067 in addition to pocl/pocl#1069 it currently fails when trying to access an array on the host after executing a GPU kernel. The following code reproduces this: import numpy as np
import pyopencl as cl
import pyopencl.array as cla
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
alloc = cl.tools.SVMAllocator(ctx, cl.svm_mem_flags.READ_WRITE, queue=queue)
ary = cla.zeros(queue, 20, np.float64, allocator=alloc)
ary.fill(17) # runs as GPU kernel
ary.copy() # segfaults here The full output log is (with $ python svm2.py ocl-icd(ocl_icd_loader.c:776): __initClIcd: Reading icd list from '/home/mdiener/Work/pocl/install/etc/OpenCL/vendors/' ocl-icd(ocl_icd_loader.c:234): _find_num_icds: return: 1/0x1 ocl-icd(ocl_icd_loader.c:265): _open_driver: Considering file '/home/mdiener/Work/pocl/install/etc/OpenCL/vendors//pocl.icd' ocl-icd(ocl_icd_loader.c:239): _load_icd: Loading ICD '/home/mdiener/Work/pocl/install/lib/libpocl.so.2.9.0' ocl-icd(ocl_icd_loader.c:243): _load_icd: ICD[0] loaded ocl-icd(ocl_icd_loader.c:297): _open_driver: return: 1/0x1 ocl-icd(ocl_icd_loader.c:320): _open_drivers: return: 1/0x1 ocl-icd(ocl_icd_loader.c:477): _find_and_check_platforms: Checking ICD 0/1 ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clGetExtensionFunctionAddress ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412730417/0x7f8f8672f031 ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clIcdGetPlatformIDsKHR ocl-icd(ocl_icd_loader.c:328): _get_function_addr: Missing global symbol 'clIcdGetPlatformIDsKHR' in ICD, should be skipped ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412732259/0x7f8f8672f763 ocl-icd(ocl_icd_loader.c:325): _get_function_addr: Looking for function clGetPlatformInfo ocl-icd(ocl_icd_loader.c:328): _get_function_addr: Missing global symbol 'clGetPlatformInfo' in ICD, should be skipped ocl-icd(ocl_icd_loader.c:343): _get_function_addr: return: 140254412570414/0x7f8f86707f2e ocl-icd(ocl_icd_loader.c:526): _find_and_check_platforms: Try to load 1 platforms ocl-icd(ocl_icd_loader.c:348): _allocate_platforms: Requesting allocation for 1 platforms ocl-icd(ocl_icd_loader.c:358): _allocate_platforms: return: 1/0x1 ocl-icd(ocl_icd_loader.c:533): _find_and_check_platforms: Checking platform 0 ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: cl_khr_icd cl_pocl_content_size ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: POCL ocl-icd(ocl_icd_loader.c:603): _find_and_check_platforms: Extension suffix: POCL ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: FULL_PROFILE ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: OpenCL 3.0 PoCL 3.1-pre cuda-svm-0-g8e6dd829 Linux, Debug+Asserts, RELOC, SPIR, LLVM 14.0.6, SLEEF, CUDA, POCL_DEBUG ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: Portable Computing Language ocl-icd(ocl_icd_loader.c:384): _malloc_clGetPlatformInfo: return: The pocl project ocl-icd(ocl_icd_loader.c:431): _sort_platforms: Nb platefroms: 1 ocl-icd(ocl_icd_loader.c:824): __initClIcd: 1 valid vendor(s)! ocl-icd(ocl_icd_loader.c:1060): clGetPlatformIDs: Entering ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering ** Final POCL_DEBUG flags: FFFFFFFFFFFFFFFF [2022-07-06 04:18:58.406713745]POCL: in fn pocl_install_sigfpe_handler at line 229: | GENERAL | Installing SIGFPE handler... [2022-07-06 04:18:58.522237845]POCL: in fn pocl_cuda_init at line 397: | GENERAL | [CUDA] GPU architecture = sm_35 [2022-07-06 04:18:58.522300352]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522331286]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522343606]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522360890]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522373642]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522387107]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522400285]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522415089]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522427965]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522440770]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/cuda/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522453608]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/cuda/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522466226]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/cuda/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522479006]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522491669]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522503989]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522517051]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/nvidia-cuda-toolkit/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522529598]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/local/lib/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522542592]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/local/lib/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522555310]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/lib/nvvm/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522568743]POCL: in fn findLibDevice at line 581: | CUDA | looking for libdevice at '/usr/lib/nvvm/libdevice/libdevice.compute_35.10.bc' [2022-07-06 04:18:58.522580592]POCL: in fn findLibDevice at line 569: | CUDA | looking for libdevice at '/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' [2022-07-06 04:18:58.522596110]POCL: in fn findLibDevice at line 572: | CUDA | found libdevice at '/usr/lib/nvidia-cuda-toolkit/libdevice/libdevice.10.bc' ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1683): clGetDeviceIDs: Entering ocl-icd(ocl_icd_loader_gen.c:1691): clGetDeviceIDs: return: 0/0x0 ocl-icd(ocl_icd_loader.c:1140): clCreateContext: Entering [2022-07-06 04:18:58.630074880]POCL: in fn POclCreateCommandQueue at line 47: | GENERAL | Create Command queue on device 1 [2022-07-06 04:18:58.630114638]POCL: in fn void pocl_llvm_create_context(cl_context) at line 379: | LLVM | creating LLVM context ocl-icd(ocl_icd_loader.c:1149): clCreateContext: return: 94104096093728/0x559651383620 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3259): clCreateCommandQueueWithProperties: Entering [2022-07-06 04:18:58.630255155]POCL: in fn POclCreateCommandQueue at line 47: | GENERAL | Create Command queue on device 1 [2022-07-06 04:18:58.630266146]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 2 ocl-icd(ocl_icd_loader_gen.c:3268): clCreateCommandQueueWithProperties: return: 94104096102768/0x559651385970 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.630350961]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 3 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1765): clRetainCommandQueue: Entering [2022-07-06 04:18:58.630446057]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 2 ocl-icd(ocl_icd_loader_gen.c:1771): clRetainCommandQueue: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3310): clSVMAlloc: Entering SVM cuMemAllocManaged 160 before write 0 0x4204060000 after write 42.000000 [2022-07-06 04:18:58.630824524]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 4 [2022-07-06 04:18:58.630836034]POCL: in fn POclSVMAlloc at line 114: | MEMORY | Allocated SVM: PTR 0x4204060000, SIZE 160, FLAGS 1 ocl-icd(ocl_icd_loader_gen.c:3316): clSVMAlloc: return: 283535343616/0x4204060000 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3379): clEnqueueSVMMemFill: Entering [2022-07-06 04:18:58.631313572]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 3 [2022-07-06 04:18:58.631323842]POCL: in fn pocl_create_event at line 514: | EVENTS | Created event 0x55965138ddb0 / ID 1 / Command svm_memfill [2022-07-06 04:18:58.631333776]POCL: in fn pocl_create_command_struct at line 630: | EVENTS | event pointer provided [2022-07-06 04:18:58.631342637]POCL: in fn pocl_create_command_struct at line 650: | EVENTS | Created command struct: CMD 0x55965138dcd0 (event 1 / 0x55965138ddb0, type: svm_memfill) [2022-07-06 04:18:58.631353096]POCL: in fn pocl_command_enqueue at line 1061: | EVENTS | In-order Q; adding event syncs [2022-07-06 04:18:58.631362451]POCL: in fn pocl_command_enqueue at line 1105: | EVENTS | Pushed Event 1 to CQ 5. [2022-07-06 04:18:58.631372020]POCL: in fn pocl_update_event_queued at line 1922: | EVENTS | Event queued: 1 [2022-07-06 04:18:58.631394026]POCL: in fn pocl_update_event_submitted at line 1942: | EVENTS | Event submitted: 1 BEFORE MEMFILL 160 0x4204060000 AFTER MEMFILL ocl-icd(ocl_icd_loader_gen.c:3385): clEnqueueSVMMemFill: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2005): clCreateProgramWithSource: Entering [2022-07-06 04:18:58.632158085]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 5 ocl-icd(ocl_icd_loader_gen.c:2014): clCreateProgramWithSource: return: 94104096146192/0x559651390310 ocl-icd(ocl_icd_loader_gen.c:2066): clBuildProgram: Entering [2022-07-06 04:18:58.632208081]POCL: in fn compile_and_link_program at line 691: | LLVM | building program with options -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl [2022-07-06 04:18:58.632218813]POCL: in fn compile_and_link_program at line 713: | LLVM | building program for 0 devs with options -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl [2022-07-06 04:18:58.632228799]POCL: in fn compile_and_link_program at line 717: | LLVM | BUILDING for device: Tesla K40c [2022-07-06 04:18:58.632238807]POCL: in fn pocl_driver_build_source at line 511: | LLVM | building from sources for device 0 [2022-07-06 04:18:58.632438048]POCL: in fn int pocl_llvm_build_program(cl_program, unsigned int, cl_uint, _cl_program* const*, const char**, int) at line 382: | LLVM | all build options: -mllvm --nvptx-short-ptr -Dcl_khr_int64 -DPOCL_DEVICE_ADDRESS_BITS=64 -D__USE_CLANG_OPENCL_C_H -xcl -Dinline= -I. -cl-kernel-arg-info -I /shared/home/mdiener/Work/pyopencl/pyopencl/cl -D__ENDIAN_LITTLE__=1 -DCL_DEVICE_MAX_GLOBAL_VARIABLE_SIZE=0 -D__OPENCL_VERSION__=120 -cl-std=CL1.2 -D__OPENCL_C_VERSION__=120 -Dcl_khr_byte_addressable_store=1 -Dcl_khr_global_int32_base_atomics=1 -Dcl_khr_global_int32_extended_atomics=1 -Dcl_khr_local_int32_base_atomics=1 -Dcl_khr_local_int32_extended_atomics=1 -Dcl_khr_fp64=1 -Dcl_khr_int64_base_atomics=1 -Dcl_khr_int64_extended_atomics=1 -Dcl_nv_device_attribute_query=1 -Dcl_khr_spir=1 -cl-ext=-all,+cl_khr_byte_addressable_store,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_extended_atomics,+cl_khr_local_int32_base_atomics,+cl_khr_local_int32_extended_atomics,+cl_khr_fp64,+cl_khr_int64_base_atomics,+cl_khr_int64_extended_atomics,+cl_nv_device_attribute_query,+cl_khr_spir -fno-builtin -triple=nvptx64 -target-cpu sm_35 ocl-icd(ocl_icd_loader_gen.c:2072): clBuildProgram: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2081): clGetProgramInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2087): clGetProgramInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2081): clGetProgramInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2087): clGetProgramInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2097): clGetProgramBuildInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2103): clGetProgramBuildInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2097): clGetProgramBuildInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2103): clGetProgramBuildInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2110): clCreateKernel: Entering ocl-icd(ocl_icd_loader_gen.c:2119): clCreateKernel: return: 94104096155376/0x5596513926f0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.697513230]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.697573667]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.698408613]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.698444839]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.698467371]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.698561830]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2178): clGetKernelInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2184): clGetKernelInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1711): clRetainContext: Entering [2022-07-06 04:18:58.698612903]POCL: in fn POclRetainContext at line 32: | REFCOUNTS | Retain Context 0x559651383620 : 6 ocl-icd(ocl_icd_loader_gen.c:1717): clRetainContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1737): clGetContextInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1743): clGetContextInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1722): clReleaseContext: Entering [2022-07-06 04:18:58.698642850]POCL: in fn POclReleaseContext at line 47: | REFCOUNTS | Release Context ocl-icd(ocl_icd_loader_gen.c:1728): clReleaseContext: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1666): clGetPlatformInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1674): clGetPlatformInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2050): clReleaseProgram: Entering [2022-07-06 04:18:58.699081032]POCL: in fn POclReleaseProgram at line 50: | REFCOUNTS | Release program 0x559651390310, new refcount: 1, kernel #: 1 ocl-icd(ocl_icd_loader_gen.c:2056): clReleaseProgram: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2194): clGetKernelWorkGroupInfo: Entering ocl-icd(ocl_icd_loader_gen.c:2200): clGetKernelWorkGroupInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1700): clGetDeviceInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1706): clGetDeviceInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3441): clSetKernelArgSVMPointer: Entering [2022-07-06 04:18:58.699270966]POCL: in fn POclSetKernelArgSVMPointer at line 43: | GENERAL | Setting kernel ARG 0 to SVM 0x4204060000 ocl-icd(ocl_icd_loader_gen.c:3447): clSetKernelArgSVMPointer: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering [2022-07-06 04:18:58.699293930]POCL: in fn POclSetKernelArg at line 107: | GENERAL | Kernel fill || SetArg idx 1 || long || Local 0 || Size 8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value: 0 || *(uint64*)Value: 0 || Hex Value: 00000000 00000000 ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering [2022-07-06 04:18:58.699312532]POCL: in fn POclSetKernelArg at line 107: | GENERAL | Kernel fill || SetArg idx 2 || double || Local 0 || Size 8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value: 0 || *(uint64*)Value: 4625478292286210048 || Hex Value: 00000000 00003140 ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2163): clSetKernelArg: Entering [2022-07-06 04:18:58.699330226]POCL: in fn POclSetKernelArg at line 107: | GENERAL | Kernel fill || SetArg idx 3 || long || Local 0 || Size 8 || Value 0x7ffc31e941a8 || Pointer (nil) || *(uint32*)Value: 0 || *(uint64*)Value: 20 || Hex Value: 14000000 00000000 ocl-icd(ocl_icd_loader_gen.c:2169): clSetKernelArg: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:2507): clEnqueueNDRangeKernel: Entering [2022-07-06 04:18:58.699356022]POCL: in fn POclEnqueueNDRangeKernel at line 221: | GENERAL | Queueing kernel fill with local size 32 x 1 x 1 group sizes 1 x 1 x 1... [2022-07-06 04:18:58.699367414]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 4 [2022-07-06 04:18:58.699376473]POCL: in fn pocl_create_event at line 514: | EVENTS | Created event 0x55965144f190 / ID 2 / Command ndrange_kernel [2022-07-06 04:18:58.699384689]POCL: in fn pocl_create_command_struct at line 630: | EVENTS | event pointer provided [2022-07-06 04:18:58.699394485]POCL: in fn pocl_create_event_sync at line 530: | EVENTS | create event sync: waiting 2 , notifier 1 [2022-07-06 04:18:58.699403512]POCL: in fn pocl_create_command_struct at line 650: | EVENTS | Created command struct: CMD 0x559650ddf460 (event 2 / 0x55965144f190, type: ndrange_kernel) [2022-07-06 04:18:58.699413923]POCL: in fn POclRetainKernel at line 33: | REFCOUNTS | Retain Kernel 0x5596513926f0 : 2 [2022-07-06 04:18:58.699422964]POCL: in fn pocl_command_enqueue at line 1061: | EVENTS | In-order Q; adding event syncs [2022-07-06 04:18:58.699429314]POCL: in fn pocl_create_event_sync at line 530: | EVENTS | create event sync: waiting 2 , notifier 1 [2022-07-06 04:18:58.699438005]POCL: in fn pocl_create_event_sync at line 543: | EVENTS | Skipping event sync creation [2022-07-06 04:18:58.699446350]POCL: in fn pocl_create_event_sync at line 530: | EVENTS | create event sync: waiting 2 , notifier 1 [2022-07-06 04:18:58.699454328]POCL: in fn pocl_create_event_sync at line 543: | EVENTS | Skipping event sync creation [2022-07-06 04:18:58.699462708]POCL: in fn pocl_command_enqueue at line 1105: | EVENTS | Pushed Event 2 to CQ 5. [2022-07-06 04:18:58.699470964]POCL: in fn pocl_update_event_queued at line 1922: | EVENTS | Event queued: 2 [2022-07-06 04:18:58.699489571]POCL: in fn pocl_update_event_submitted at line 1942: | EVENTS | Event submitted: 2 ocl-icd(ocl_icd_loader_gen.c:2513): clEnqueueNDRangeKernel: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1765): clRetainCommandQueue: Entering [2022-07-06 04:18:58.700081115]POCL: in fn POclRetainCommandQueue at line 33: | REFCOUNTS | Retain Command Queue 0x559651385970 : 5 ocl-icd(ocl_icd_loader_gen.c:1771): clRetainCommandQueue: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:1791): clGetCommandQueueInfo: Entering ocl-icd(ocl_icd_loader_gen.c:1797): clGetCommandQueueInfo: return: 0/0x0 ocl-icd(ocl_icd_loader_gen.c:3310): clSVMAlloc: Entering SVM cuMemAllocManaged 160 before write 0 0x4204061000 Bus error (core dumped) i.e., it seems to successfully allocate memory with |
I obviously can't guarantee that that's what at issue here, but I suspect you'll need pocl/pocl#1069 (or another fix for the same issue) in order to allow this to work. I'm actually sort of surprised pocl-pthreads worked. If that doesn't help, a backtrace and potentially the first meaningful thing flagged by valgrind would be of use. |
I'm sorry, I should have been clearer. I used inducer/pyopencl#452 and pocl/pocl#1069 (for POCL-pthreads and POCL-cuda) for all tests. Note that my failing example above doesn't even use loopy (the loopy test in this PR just exposed it), so this PR might be the wrong location to track this issue.
Here is a part of the backtrace at the point of the crash. It doesn't appear to be too useful though. SVM cuMemAllocManaged 160
before write 0 0x4204061000
--Type <RET> for more, q to quit, c to continue without paging--q
Thread 1 "python" received signal SIGBUS, Bus error.
0x00007fff8510e39c in pocl_cuda_svm_alloc (dev=0x555555f27020, flags=1, size=160)
at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1895
1895 ((double*)dptr)[0] = 42.0;
(gdb) bt
#0 0x00007fff8510e39c in pocl_cuda_svm_alloc (dev=0x555555f27020, flags=1, size=160)
at /home/mdiener/Work/pocl/lib/CL/devices/cuda/pocl-cuda.c:1895
#1 0x00007fff91c7cfb6 in POclSVMAlloc (context=0x5555564b86d0, flags=1, size=160, alignment=128)
at /home/mdiener/Work/pocl/lib/CL/clSVMAlloc.c:98
#2 0x00007fff9220638f in clSVMAlloc () from /shared/home/mdiener/Work/emirge/miniforge3/envs/poclbuild/lib/libOpenCL.so.1
#3 0x00007fff922bb337 in pybind11::cpp_function::initialize<pybind11::detail::initimpl::constructor<std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*>::execute<pybind11::class_<pyopencl::svm_allocation>, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v, 0>(pybind11::class_<pyopencl::svm_allocation>&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*)#1}, void, pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::detail::is_new_style_constructor, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg, pybind11::arg_v>(pybind11::class_<pyopencl::svm_allocation>&&, void (*)(pybind11::detail::value_and_holder&, std::shared_ptr<pyopencl::context>, unsigned long, unsigned int, unsigned long, pyopencl::command_queue const*), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::detail::is_new_style_constructor const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg const&, pybind11::arg_v const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#4 0x00007fff922494a6 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) ()
from /shared/home/mdiener/Work/pyopencl/pyopencl/_cl.cpython-310-x86_64-linux-gnu.so
#5 0x000055555569850c in cfunction_call (func=0x7fff9240e1b0, args=<optimized out>, kwargs=<optimized out>)
at /usr/local/src/conda/python-3.10.5/Objects/methodobject.c:543
#6 0x00005555556a6db9 in _PyObject_Call (kwargs=<optimized out>, args=0x7fff91e2cdc0, callable=0x7fff9240e1b0, tstate=0x55555591be10)
at /usr/local/src/conda/python-3.10.5/Objects/call.c:305
[...] I modified pocl/pocl#1067 such that it tries to write to the just allocated buffer after a successful This is on koelsch with a Tesla K40c. |
With |
Btw, I agree that this discussion does not have much to do with Loopy. Maybe let's continue the discussion here: inducer/pyopencl#452. |
(Continuing the discussion here for a bit since I got the loopy test in this PR running with the change outlined below)
Hmm, that is interesting. Based on your comment, I tried
The device I'm running on (K40c) returns I found another fix (workaround?) in pocl/pocl@03ffc71 which just uses CUDA functions for the memfill operation. With that fix, my simple test and the test in this PR also work.
Here is a backtrace with my debug cruft removed:
It crashes in the memfill operation when accessing |
OK, nice. As far as I can tell, using CUDA to do the SVM fill is the correct thing to do. Otherwise it's neither queue-synchronized nor performed from the device. I don't believe |
f8a6477
to
92122b2
Compare
@matthiasdiener Please don't force-push to branches on which more than one person is working. Not only is there a risk of clobbering one another's work, it's also very hard to review what's being changed. |
b963eeb
to
ebbdc64
Compare
With the current version of mirgecom, and setting
Edit: This was resolved by rebasing Kaushik's branch of arraycontext. |
ddc2303
to
70bfb03
Compare
… struct Co-authored-by: Matthias Diener <[email protected]>
Needs: