You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Segmentation Fault during CUDA Initialization with GPU Offloading Enabled
Description:
When running the binary with GPU offloading enabled (e.g., using -ngl 1), the application crashes with a segmentation fault at address 0x328. Running the binary without GPU support (e.g., using --gpu disable) works correctly. The logs indicate that the crash occurs during CUDA initialization, suggesting a possible null pointer dereference or misconfiguration during the dynamic linking of the CUDA module.
Environment:
OS: Linux (Debian-based, Cosmopolitan 4.0.2, kernel 6.1.x)
GPU: NVIDIA A100 (or similar)
Driver/CUDA: NVIDIA driver version 535.x; CUDA Toolkit version 12.x
CUDA Installation: Installed in a custom location (configured via environment variables)
Build System: Cosmocc toolchain with Make
Model: Qwen2.5-0.5B-Instruct-GGUF (a small model with no expected GPU memory issues)
The binary crashes with a segmentation fault (see error below). The crash occurs consistently when any GPU offloading is enabled—even a minimal layer count (e.g., -ngl 1) triggers the fault. Running with --gpu disable allows the model to load and operate normally. The crash address (0x328) and early log messages hint at a potential issue in the CUDA initialization code (referenced in llama.cpp/ggml-cuda.cu and llama.cpp/ggml-cuda.h).
Any assistance or direction would be greatly appreciated.
Version
llamafile v0.9.0
What operating system are you seeing the problem on?
Contact Details
No response
What happened?
Segmentation Fault during CUDA Initialization with GPU Offloading Enabled
Description:
When running the binary with GPU offloading enabled (e.g., using
-ngl 1
), the application crashes with a segmentation fault at address0x328
. Running the binary without GPU support (e.g., using--gpu disable
) works correctly. The logs indicate that the crash occurs during CUDA initialization, suggesting a possible null pointer dereference or misconfiguration during the dynamic linking of the CUDA module.Environment:
Steps to Reproduce:
Build the project:
Run the binary with GPU offloading enabled:
The binary crashes with a segmentation fault (see error below). The crash occurs consistently when any GPU offloading is enabled—even a minimal layer count (e.g.,
-ngl 1
) triggers the fault. Running with--gpu disable
allows the model to load and operate normally. The crash address (0x328
) and early log messages hint at a potential issue in the CUDA initialization code (referenced inllama.cpp/ggml-cuda.cu
andllama.cpp/ggml-cuda.h
).Any assistance or direction would be greatly appreciated.
Version
llamafile v0.9.0
What operating system are you seeing the problem on?
No response
Relevant log output
The text was updated successfully, but these errors were encountered: