prompt generation very slow #41

sipie800 · 2024-06-16T15:15:51Z

For 4096 token(which is forced by omost), use llama-3 model at 4090, it take 120s to complete prompt. And it take only 7s for SD. It's a big gap.
How can we accelerate the local GPT?

zhaijunxiao · 2024-06-17T01:39:24Z

The official LLM's method runs slow.
You can speed up by using TGI or llama.cpp.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt generation very slow #41

prompt generation very slow #41

sipie800 commented Jun 16, 2024

zhaijunxiao commented Jun 17, 2024

prompt generation very slow #41

prompt generation very slow #41

Comments

sipie800 commented Jun 16, 2024

zhaijunxiao commented Jun 17, 2024