You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I compiled llamafile from the current github version without issues in a Linux system (Fedora 40)
-> llamafile --version
llamafile v0.9.0
Calling
llamafile --server --v2 -m (whatever .gguf file)
works fine
The support for GPU, nevertheless, does not work.
I have installed the NVIDA GPU drivers and the nvidia toolkit. Everything seems to be correctly installed.
The information command nvidia-smi returns
+-----------------------------------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77 Driver Version: 565.77 CUDA Version: 12.7 |
|--------------------------------------------------+-----------------------------+-------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:89:00.0 Off | Off |
| 30% 22C P8 21W / 300W | 2MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-------------------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|============================================================|
| No running processes found |
+-------------------------------------------------------------------------------------------------+
which seems OK.
When invoking
llamafile --server --v2 -ngl 999 --gpu NVIDIA -m (whatever .gguf file)
I get the following messages:
import_cuda_impl: initializing gpu module...
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
compile_nvidia: note: building ggml-cuda with nvcc -arch=native...
llamafile_log_command: /usr/local/cuda/bin/nvcc -arch=native -std=c++11 -O3 --shared --use_fast_math -Xcudafe --diag_suppress=177 -Xcudafe --diag_suppress=940 -Xcudafe --diag_suppress=1305 --forward-unknown-to-host-compiler --compiler-options "-fPIC -O3 -march=native -mtune=native -std=c++11 -Wno-unused-function -Wno-unused-result -Wno-return-type -Wno-pedantic" -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_MINIMIZE_CODE_SIZE -o /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.so.wk3hit /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.cu -lcublas -lcuda
/usr/include/c++/14/type_traits(1610): error: "__is_nothrow_new_constructible" is not a function or static data member
constexpr bool __is_nothrow_new_constructible
(and many type_traits messages like these below...)
So the compiler of the CUDA code does not work.
Suspecting an issue with the C++ version, I modified the code in llamafile/cuda.c to substitute all references to -std=c++11 to c++14 and recompiled llamafile. This time the cuda code compilation worked (albeit it took a really long time) and the executables and .so dynamic library were generated in my .llamafile/v local directory. But when calling the llamafile again the launch of a prompt generated a SIGSEV error. The server did not crash, but it did not answered anything. I could recompile again llamafile in this mode and reproduce exactly the error is someone thinks it may help. But the problem would be that by doing so I am modifying the intended way of compiling llamafile and I am afraid this would be ahead of what the maintainers can currently answer.
Then, my question is:
May the violation segmentation be related with the use of C++14 instead of the intended C++11?
If so, what can be done to force my system don't use the /usr/include/c++/14/type_traits which seems to be intended exclusively for C++14?
In the case other tests are needed or there are other files or command outputs I should provide, please let me know.
Thanks a lot in advance
Juan
The text was updated successfully, but these errors were encountered:
I compiled llamafile from the current github version without issues in a Linux system (Fedora 40)
-> llamafile --version
llamafile v0.9.0
Calling
llamafile --server --v2 -m (whatever .gguf file)
works fine
The support for GPU, nevertheless, does not work.
I have installed the NVIDA GPU drivers and the nvidia toolkit. Everything seems to be correctly installed.
The information command nvidia-smi returns
+-----------------------------------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77 Driver Version: 565.77 CUDA Version: 12.7 |
|--------------------------------------------------+-----------------------------+-------------------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA RTX A6000 Off | 00000000:89:00.0 Off | Off |
| 30% 22C P8 21W / 300W | 2MiB / 49140MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-------------------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|============================================================|
| No running processes found |
+-------------------------------------------------------------------------------------------------+
which seems OK.
When invoking
llamafile --server --v2 -ngl 999 --gpu NVIDIA -m (whatever .gguf file)
I get the following messages:
import_cuda_impl: initializing gpu module...
extract_cuda_dso: note: prebuilt binary /zip/ggml-cuda.so not found
compile_nvidia: note: building ggml-cuda with nvcc -arch=native...
llamafile_log_command: /usr/local/cuda/bin/nvcc -arch=native -std=c++11 -O3 --shared --use_fast_math -Xcudafe --diag_suppress=177 -Xcudafe --diag_suppress=940 -Xcudafe --diag_suppress=1305 --forward-unknown-to-host-compiler --compiler-options "-fPIC -O3 -march=native -mtune=native -std=c++11 -Wno-unused-function -Wno-unused-result -Wno-return-type -Wno-pedantic" -DNDEBUG -DGGML_BUILD=1 -DGGML_SHARED=1 -DGGML_MULTIPLATFORM -DGGML_CUDA_DMMV_X=32 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_MINIMIZE_CODE_SIZE -o /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.so.wk3hit /home/jdomingo/.llamafile/v/0.9.0/ggml-cuda.cu -lcublas -lcuda
/usr/include/c++/14/type_traits(1610): error: "__is_nothrow_new_constructible" is not a function or static data member
constexpr bool __is_nothrow_new_constructible
(and many type_traits messages like these below...)
So the compiler of the CUDA code does not work.
Suspecting an issue with the C++ version, I modified the code in llamafile/cuda.c to substitute all references to -std=c++11 to c++14 and recompiled llamafile. This time the cuda code compilation worked (albeit it took a really long time) and the executables and .so dynamic library were generated in my .llamafile/v local directory. But when calling the llamafile again the launch of a prompt generated a SIGSEV error. The server did not crash, but it did not answered anything. I could recompile again llamafile in this mode and reproduce exactly the error is someone thinks it may help. But the problem would be that by doing so I am modifying the intended way of compiling llamafile and I am afraid this would be ahead of what the maintainers can currently answer.
Then, my question is:
In the case other tests are needed or there are other files or command outputs I should provide, please let me know.
Thanks a lot in advance
Juan
The text was updated successfully, but these errors were encountered: