Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU files cannot be compiled #691

Open
JdMDE opened this issue Feb 7, 2025 · 0 comments
Open

GPU files cannot be compiled #691

JdMDE opened this issue Feb 7, 2025 · 0 comments

Comments

@JdMDE
Copy link

JdMDE commented Feb 7, 2025

I compiled llamafile from the current github version without issues in a Linux system (Fedora 40)
-> llamafile --version
llamafile v0.9.0

Calling
llamafile --server --v2 -m (whatever .gguf file)
works fine

The support for GPU, nevertheless, does not work.
I have installed the NVIDA GPU drivers and the nvidia toolkit. Everything seems to be correctly installed.
The information command nvidia-smi returns

+-----------------------------------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77 Driver Version: 565.77 CUDA Version: 12.7 |
|--------------------------------------------------+-----------------------------+-------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:89:00.0 Off | Off |
| 30% 22C P8 21W / 300W | 2MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-------------------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|============================================================|
| No running processes found |
+-------------------------------------------------------------------------------------------------+
which seems OK.

When invoking
llamafile --server --v2 -ngl 999 --gpu NVIDIA -m (whatever .gguf file)
I get the following messages:

import_cuda_impl: initializing gpu module...
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
compile_nvidia: note: building ggml-cuda with nvcc -arch=native...
llamafile_log_command: /usr/local/cuda/bin/nvcc -arch=native -std=c++11 -O3 --shared --use_fast_math -Xcudafe --diag_suppress=177 -Xcudafe --diag_suppress=940 -Xcudafe --diag_suppress=1305 --forward-unknown-to-host-compiler --compiler-options "-fPIC -O3 -march=native -mtune=native -std=c++11 -Wno-unused-function -Wno-unused-result -Wno-return-type -Wno-pedantic" -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_MINIMIZE_CODE_SIZE -o /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.so.wk3hit /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.cu -lcublas -lcuda
/usr/include/c++/14/type_traits(1610): error: "__is_nothrow_new_constructible" is not a function or static data member
constexpr bool __is_nothrow_new_constructible

(and many type_traits messages like these below...)

So the compiler of the CUDA code does not work.

Suspecting an issue with the C++ version, I modified the code in llamafile/cuda.c to substitute all references to -std=c++11 to c++14 and recompiled llamafile. This time the cuda code compilation worked (albeit it took a really long time) and the executables and .so dynamic library were generated in my .llamafile/v local directory. But when calling the llamafile again the launch of a prompt generated a SIGSEV error. The server did not crash, but it did not answered anything. I could recompile again llamafile in this mode and reproduce exactly the error is someone thinks it may help. But the problem would be that by doing so I am modifying the intended way of compiling llamafile and I am afraid this would be ahead of what the maintainers can currently answer.

Then, my question is:

  • May the violation segmentation be related with the use of C++14 instead of the intended C++11?
  • If so, what can be done to force my system don't use the /usr/include/c++/14/type_traits which seems to be intended exclusively for C++14?

In the case other tests are needed or there are other files or command outputs I should provide, please let me know.

Thanks a lot in advance
Juan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant