Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Segmentation Fault during CUDA Initialization with GPU Offloading Enabled #696

Open
FordUniver opened this issue Feb 17, 2025 · 5 comments

Comments

@FordUniver
Copy link

Contact Details

No response

What happened?

Segmentation Fault during CUDA Initialization with GPU Offloading Enabled

Description:
When running the binary with GPU offloading enabled (e.g., using -ngl 1), the application crashes with a segmentation fault at address 0x328. Running the binary without GPU support (e.g., using --gpu disable) works correctly. The logs indicate that the crash occurs during CUDA initialization, suggesting a possible null pointer dereference or misconfiguration during the dynamic linking of the CUDA module.

Environment:

  • OS: Linux (Debian-based, Cosmopolitan 4.0.2, kernel 6.1.x)
  • GPU: NVIDIA A100 (or similar)
  • Driver/CUDA: NVIDIA driver version 535.x; CUDA Toolkit version 12.x
  • CUDA Installation: Installed in a custom location (configured via environment variables)
  • Build System: Cosmocc toolchain with Make
  • Model: Qwen2.5-0.5B-Instruct-GGUF (a small model with no expected GPU memory issues)

Steps to Reproduce:

  1. Set the environment variables.
export CUDA_PATH=<CUSTOM_CUDA_PATH>
export CUDA_HOME=<CUSTOM_CUDA_PATH>
export CUDA_INC_PATH=<CUSTOM_CUDA_PATH>/include
export PATH="<CUSTOM_CUDA_PATH>/bin:$PATH"
export LD_LIBRARY_PATH="<CUSTOM_CUDA_PATH>/lib64:$LD_LIBRARY_PATH"
  1. Build the project:

    make -j8
  2. Run the binary with GPU offloading enabled:

    ./o/llama.cpp/main -m /path/to/model.gguf -ngl 999

The binary crashes with a segmentation fault (see error below). The crash occurs consistently when any GPU offloading is enabled—even a minimal layer count (e.g., -ngl 1) triggers the fault. Running with --gpu disable allows the model to load and operate normally. The crash address (0x328) and early log messages hint at a potential issue in the CUDA initialization code (referenced in llama.cpp/ggml-cuda.cu and llama.cpp/ggml-cuda.h).

Any assistance or direction would be greatly appreciated.

Version

llamafile v0.9.0

What operating system are you seeing the problem on?

No response

Relevant log output

██╗     ██╗      █████╗ ███╗   ███╗ █████╗ ███████╗██╗██╗     ███████╗
██║     ██║     ██╔══██╗████╗ ████║██╔══██╗██╔════╝██║██║     ██╔════╝
██║     ██║     ███████║██╔████╔██║███████║█████╗  ██║██║     █████╗
██║     ██║     ██╔══██║██║╚██╔╝██║██╔══██║██╔══╝  ██║██║     ██╔══╝
███████╗███████╗██║  ██║██║ ╚═╝ ██║██║  ██║██║     ██║███████╗███████╗
╚══════╝╚══════╝╚═╝  ╚═╝╚═╝     ╚═╝╚═╝  ╚═╝╚═╝     ╚═╝╚══════╝╚══════╝
 launching server...
error: Uncaught SIGSEGV (SEGV_MAPERR) at 0x328 on coder-cspiegel-gpu-dc4f9657c-lq6mx pid 50320 tid 50324
  ./main
  No error information
  Linux Cosmopolitan 4.0.2 MODE=x86_64; #1 SMP PREEMPT_DYNAMIC Debian 6.1.119-1 (2024-11-22) coder-cspiegel-gpu-dc4f9657c-lq6mx 6.1.0-28-amd64

RAX 0000000000000320 RBX 000000007fcbffe0 RDI 0000000000000001
RCX 00007f9a6f842880 RDX 0000000000000001 RSI 0000000000000000
RBP 00007f9a6bc88940 RSP 00007f9a6bc888f8 RIP 00007f9a6f8d3ae8
 R8 0000000005c00000  R9 00007f9a6bc88cf0 R10 0000000000100000
R11 0000000000000001 R12 00007f9a6bc88968 R13 00007f9a6bc88968
R14 0000000000000001 R15 0000000000000000
TLS 00007f9a60482b40

XMM0  00000000000000000000000000000000 XMM8  00000000000000000000000000000000
XMM1  00000000000000000000000000000000 XMM9  ffffffffffffffffffffffffffffffff
XMM2  00000000000000000000000000000190 XMM10 ffffffffffffffffffffffffffffffff
XMM3  00007f9a6bf06e4000007f9a6bf07e10 XMM11 00000000000000000000000000000000
XMM4  000000000000000000007f9a6bf07b70 XMM12 00000000000000000000000000000000
XMM5  00000000000000000000000000000000 XMM13 ffffffffffffffffffffffffffffffff
XMM6  6e6577512f736c65646f6d2f73746365 XMM14 00000000000000000000000000000000
XMM7  6a6f72702f74327369612f617461642f XMM15 00000000000000000000000000000000

cosmoaddr2line /home/htc/cspiegel/repositories/llamafile/o/llama.cpp/main/main.com.dbg 7f9a6f8d3ae8 7f9a6bfd0389 7f9a6bfc5982 7f9a6bfa1c08 7f9a6bfe66db 7f9a6bf1eb7b  7f9a6f842880

7f9a6bc85e40 7f9a6f8d3ae8 NULL+0
7f9a6bc88940 7f9a6bfd0389 NULL+0
7f9a6bc88990 7f9a6bfc5982 NULL+0
7f9a6bc889b0 7f9a6bfa1c08 NULL+0
7f9a6bc889f0 7f9a6bfe66db NULL+0
7f9a6bc88b00 7f9a6bf1eb7b NULL+0
<dangerous frame>

000000400000-000000ae31e0 r-xi- 7052kb
000000ae4000-000003252000 rw-i- 39mb
000003252000-0006fe000000       28gb
0006fe000000-0006fe001000 rw-pa 4096b
0006fe001000-7f9a0f9b9000       128tb
7f9a0f9b9000-7f9a27ffff60 r--s- 390mb
7f9a28000000-7f9a4e934000       617mb
7f9a4e934000-7f9a4eb34000 rw-pa 2048kb
7f9a4eb34000-7f9a4f800000       13mb
7f9a4f800000-7f9a50000000 rw-pa 8192kb
7f9a50000000-7f9a602a3000       259mb
7f9a602a3000-7f9a632a3000 rw-pa 48mb
7f9a632a3000-7f9a6bc77000       138mb
7f9a6bc77000-7f9a6bc78000 ---pa 4096b
7f9a6bc78000-7f9a6bc8c000 rw-pa 80kb
7f9a6bc8c000-7f9a6fa31000       62mb
7f9a6fa31000-7f9a6fa31980 rw-pa 2432b
7f9a6fa32000-7f9a6fa7e000       304kb
7f9a6fa7e000-7f9a6fbb85d0 rw-pa 1257kb
7f9a6fbb9000-7f9a6fcae3c8 r--s- 981kb
7f9a6fcaf000-7f9a6feef000 rw-pa 2304kb
7f9a6feef000-7ffe26769000       399gb
7ffe26769000-7ffe26869000 ---pa 1024kb
7ffe26869000-7ffe27069000 rw-pa 8192kb
# 532'811'776 bytes in 15 mappings
@FordUniver
Copy link
Author

I should mention that the precompiled binaries result in the same error.

@cjpais
Copy link
Collaborator

cjpais commented Feb 17, 2025

I went and bisected this and it looks like the commit c293359 is the issue, upgrading to Cosmo 4.0

I just built on my machine with 3.9.7 with no issues

@FordUniver
Copy link
Author

That seems to do the trick, thanks for the bisect! Is this a general issue with any CUDA installation or is there something unusual about my setup?

@cjpais
Copy link
Collaborator

cjpais commented Feb 17, 2025

Should be with every CUDA setup as far as I'm aware

@OEvgeny
Copy link

OEvgeny commented Mar 1, 2025

Same for me, even on the latest main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants