通过llama.cpp对omost-llama-3-8b-Q8_0-GGUF进行加速时GPU显存使用率很低，没有用到。 #55

zk19971101 · 2024-08-05T06:41:21Z

下载llama.cpp源码，通过make GGML_CUDA=1进行编译。
设置-ngl 35，通过./llama-cli -m /home/liquid/.cache/llama.cpp/omost-llama-3-8b-q8_0.gguf -ngl 35 --prompt "who are you?"进行推理，发现GPU利用率很低，推理速度很慢，没有用到GPU推理。
请问这个问题要怎么解决呢？是我llama.cpp编译的问题吗？还是omost-llama-3-8b-Q8_0-GGUF不支持GPU加速？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

通过llama.cpp对omost-llama-3-8b-Q8_0-GGUF进行加速时GPU显存使用率很低，没有用到。 #55

通过llama.cpp对omost-llama-3-8b-Q8_0-GGUF进行加速时GPU显存使用率很低，没有用到。 #55

zk19971101 commented Aug 5, 2024

通过llama.cpp对omost-llama-3-8b-Q8_0-GGUF进行加速时GPU显存使用率很低，没有用到。 #55

通过llama.cpp对omost-llama-3-8b-Q8_0-GGUF进行加速时GPU显存使用率很低，没有用到。 #55

Comments

zk19971101 commented Aug 5, 2024