Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear error in chat-ui when exceeding context window with vllm #1684

Open
ishatalkin opened this issue Feb 6, 2025 · 0 comments
Open

Unclear error in chat-ui when exceeding context window with vllm #1684

ishatalkin opened this issue Feb 6, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@ishatalkin
Copy link

Bug description

The chat-ui application encounters an unclear error when the context window is exceeded while working with the vllm. The user sees "400 status code (no body)" error. The expected error is the one, which vllm answers with: "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion"

Steps to reproduce

  1. Setup chat-ui to work with vllm. Run vllm with small context window for simplicity of testing (--max-model-len 100)
  2. Input a large amount of text or continue the conversation, causing the context window to exceed its limit.
  3. Observe the error message in the UI.

Screenshots

Image

Context

Vllm response

{
"object": "error",
"message": "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion.",
"type": "BadRequestError",
"param": null,
"code": 400
}

Specs

  • OS: Ubuntu 24.04.1 LTS
  • Browser: Chrome
  • chat-ui commit: version 0.9.4

Config

MODELS=`[
  {
    "name": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "tokenizer": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "description": "The latest Qwen open model with improved role-playing, long text generation and structured data understanding.",
    "modelUrl": "https://huggingface.co/Qwen/Qwen2.5-14B-Instruct",
    "websiteUrl": "https://qwenlm.github.io/blog/qwen2.5/",
    "logoUrl": "https://huggingface.co/datasets/huggingchat/models-logo/resolve/main/qwen-logo.png",
    "preprompt": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant.",
    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
	  "stop": ["<|endoftext|>", "<|im_end|>"],
	  "temperature": 0.6,
	  "max_new_tokens": 3072
    },
    "endpoints": [{
      "type" : "openai",
      "baseURL": "http://127.0.0.1:8000/v1"
    }],
  },
]`

Notes

It seems that the problem is in openai package

File core.mjs, method makeRequest, lines 306:308:

            const errText = await response.text().catch((e) => castToError(e).message);
            const errJSON = safeJSON(errText);
            const errMessage = errJSON ? undefined : errText; // JSON is just ignored... Why?

In the last line json answer is ignored, which causes empty body in APIError.makeMessage:

    static makeMessage(status, error, message) {
        const msg = error?.message ?
            typeof error.message === 'string' ?
                error.message
                : JSON.stringify(error.message)
            : error ? JSON.stringify(error)
                : message;
        if (status && msg) {
            return `${status} ${msg}`;
        }
        if (status) {
            return `${status} status code (no body)`; // this error is returned, because msg is undefined
        }
        if (msg) {
            return msg;
        }
        return '(no status code or body)';
    }
@ishatalkin ishatalkin added the bug Something isn't working label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant