You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
cuda-memcheck reports scrolling errors on example-mnist-classification like this:
========= Invalid __global__ write of size 4
========= at 0x00001780 in void copy_kernel<float>(cublasCopyParams<float>)
========= by thread (191,0,0) in block (0,0,0)
========= Address 0x7fd319043efc is out of bounds
To Reproduce
Steps to reproduce the behaviour:
cargo build
cuda-memcheck target/debug/example-mnist-classification mnist linear
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 51C P8 5W / N/A | 4MiB / 4096MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1936 G /usr/lib/Xorg 4MiB |
+-----------------------------------------------------------------------------+
Additional context
Note that running example-mnist-classificationwithoutcuda-memcheck works just fine and is able to converge. I only discovered this while working on #159 where doing training with CUDA does crash with CUDA_ERROR_ILLEGAL_ADDRESS when trying to copy from GPU to host. Not sure it's the same issue, but seems related.
The text was updated successfully, but these errors were encountered:
Describe the bug
cuda-memcheck
reports scrolling errors onexample-mnist-classification
like this:To Reproduce
Steps to reproduce the behaviour:
cargo build
cuda-memcheck target/debug/example-mnist-classification mnist linear
Expected behavior
No errors.
Please complete the following information:
Additional context
Note that running
example-mnist-classification
withoutcuda-memcheck
works just fine and is able to converge. I only discovered this while working on #159 where doing training with CUDA does crash withCUDA_ERROR_ILLEGAL_ADDRESS
when trying to copy from GPU to host. Not sure it's the same issue, but seems related.The text was updated successfully, but these errors were encountered: