Just getting repeating chars as response from the model #61

chriszero · 2024-02-11T19:58:41Z

chriszero
Feb 11, 2024

Hi,
I've running a docker container of text-generation-webui on a Nvidia Jetson Orion NX 8GB.
I've installed the HACS addon and configured the addon to connect to the the webui running on the Jetson device.
Using the "Home-3B-v2.q4_k_m.gguf" model
But all promts replies are just a bunch of the same chars. eg "3333333333333333333333..." or "!!!!!!!!!!!!!..." or "GGGGGG..." capped at "max_tokens"
Also interaction directly through the webui reproduce the same output.
The repeating chars changing when I select a other "Present" under "Parameters -> Generation"

If I run a other model, like TheBloke/Llama-2-7B-Chat-GGUF - llama-2-7b-chat.Q4_K_M.gguf, I can "chat" with the AI totally normal.

acon96 · 2024-02-11T21:43:31Z

acon96
Feb 11, 2024
Maintainer

Make sure you are running the latest version of text-generation-webui. That sounds like a known bug with GPU generation in llama-cpp-python v0.2.29 to v0.2.32

0 replies

colaborat0r · 2024-02-14T12:24:21Z

colaborat0r
Feb 14, 2024

Maybe check: Is TextGen actually using your GPU in the Docker container?
Are you using the llama.cpp model loader in TextGen?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Just getting repeating chars as response from the model #61

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Just getting repeating chars as response from the model #61

chriszero Feb 11, 2024

Replies: 2 comments

acon96 Feb 11, 2024 Maintainer

colaborat0r Feb 14, 2024

chriszero
Feb 11, 2024

acon96
Feb 11, 2024
Maintainer

colaborat0r
Feb 14, 2024