Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prompt generation very slow #41

Open
sipie800 opened this issue Jun 16, 2024 · 1 comment
Open

prompt generation very slow #41

sipie800 opened this issue Jun 16, 2024 · 1 comment

Comments

@sipie800
Copy link

For 4096 token(which is forced by omost), use llama-3 model at 4090, it take 120s to complete prompt. And it take only 7s for SD. It's a big gap.
How can we accelerate the local GPT?

@zhaijunxiao
Copy link
Contributor

The official LLM's method runs slow.
You can speed up by using TGI or llama.cpp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants