GPU files cannot be compiled #691

JdMDE · 2025-02-07T12:27:04Z

I compiled llamafile from the current github version without issues in a Linux system (Fedora 40)
-> llamafile --version
llamafile v0.9.0

Calling
llamafile --server --v2 -m (whatever .gguf file)
works fine

The support for GPU, nevertheless, does not work.
I have installed the NVIDA GPU drivers and the nvidia toolkit. Everything seems to be correctly installed.
The information command nvidia-smi returns

+-------------------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|============================================================|
| No running processes found |
+-------------------------------------------------------------------------------------------------+
which seems OK.

When invoking
llamafile --server --v2 -ngl 999 --gpu NVIDIA -m (whatever .gguf file)
I get the following messages:

import_cuda_impl: initializing gpu module...
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
compile_nvidia: note: building ggml-cuda with nvcc -arch=native...
llamafile_log_command: /usr/local/cuda/bin/nvcc -arch=native -std=c++11 -O3 --shared --use_fast_math -Xcudafe --diag_suppress=177 -Xcudafe --diag_suppress=940 -Xcudafe --diag_suppress=1305 --forward-unknown-to-host-compiler --compiler-options "-fPIC -O3 -march=native -mtune=native -std=c++11 -Wno-unused-function -Wno-unused-result -Wno-return-type -Wno-pedantic" -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_MINIMIZE_CODE_SIZE -o /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.so.wk3hit /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.cu -lcublas -lcuda
/usr/include/c++/14/type_traits(1610): error: "__is_nothrow_new_constructible" is not a function or static data member
constexpr bool __is_nothrow_new_constructible

(and many type_traits messages like these below...)

So the compiler of the CUDA code does not work.

Suspecting an issue with the C++ version, I modified the code in llamafile/cuda.c to substitute all references to -std=c++11 to c++14 and recompiled llamafile. This time the cuda code compilation worked (albeit it took a really long time) and the executables and .so dynamic library were generated in my .llamafile/v local directory. But when calling the llamafile again the launch of a prompt generated a SIGSEV error. The server did not crash, but it did not answered anything. I could recompile again llamafile in this mode and reproduce exactly the error is someone thinks it may help. But the problem would be that by doing so I am modifying the intended way of compiling llamafile and I am afraid this would be ahead of what the maintainers can currently answer.

Then, my question is:

May the violation segmentation be related with the use of C++14 instead of the intended C++11?
If so, what can be done to force my system don't use the /usr/include/c++/14/type_traits which seems to be intended exclusively for C++14?

In the case other tests are needed or there are other files or command outputs I should provide, please let me know.

Thanks a lot in advance
Juan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU files cannot be compiled #691

GPU files cannot be compiled #691

JdMDE commented Feb 7, 2025

GPU files cannot be compiled #691

GPU files cannot be compiled #691

Comments

JdMDE commented Feb 7, 2025