Unclear error in chat-ui when exceeding context window with vllm #1684

ishatalkin · 2025-02-06T09:23:27Z

Bug description

The chat-ui application encounters an unclear error when the context window is exceeded while working with the vllm. The user sees "400 status code (no body)" error. The expected error is the one, which vllm answers with: "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion"

Steps to reproduce

Setup chat-ui to work with vllm. Run vllm with small context window for simplicity of testing (--max-model-len 100)
Input a large amount of text or continue the conversation, causing the context window to exceed its limit.
Observe the error message in the UI.

Screenshots

Context

Vllm response

{
"object": "error",
"message": "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion.",
"type": "BadRequestError",
"param": null,
"code": 400
}

Specs

OS: Ubuntu 24.04.1 LTS
Browser: Chrome
chat-ui commit: version 0.9.4

Config

MODELS=`[
  {
    "name": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "tokenizer": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
    "description": "The latest Qwen open model with improved role-playing, long text generation and structured data understanding.",
    "modelUrl": "https://huggingface.co/Qwen/Qwen2.5-14B-Instruct",
    "websiteUrl": "https://qwenlm.github.io/blog/qwen2.5/",
    "logoUrl": "https://huggingface.co/datasets/huggingchat/models-logo/resolve/main/qwen-logo.png",
    "preprompt": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant.",
    "chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
    "parameters": {
	  "stop": ["<|endoftext|>", "<|im_end|>"],
	  "temperature": 0.6,
	  "max_new_tokens": 3072
    },
    "endpoints": [{
      "type" : "openai",
      "baseURL": "http://127.0.0.1:8000/v1"
    }],
  },
]`

Notes

It seems that the problem is in openai package

File core.mjs, method makeRequest, lines 306:308:

            const errText = await response.text().catch((e) => castToError(e).message);
            const errJSON = safeJSON(errText);
            const errMessage = errJSON ? undefined : errText; // JSON is just ignored... Why?

In the last line json answer is ignored, which causes empty body in APIError.makeMessage:

    static makeMessage(status, error, message) {
        const msg = error?.message ?
            typeof error.message === 'string' ?
                error.message
                : JSON.stringify(error.message)
            : error ? JSON.stringify(error)
                : message;
        if (status && msg) {
            return `${status} ${msg}`;
        }
        if (status) {
            return `${status} status code (no body)`; // this error is returned, because msg is undefined
        }
        if (msg) {
            return msg;
        }
        return '(no status code or body)';
    }

The text was updated successfully, but these errors were encountered:

ishatalkin added the bug Something isn't working label Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unclear error in chat-ui when exceeding context window with vllm #1684

Unclear error in chat-ui when exceeding context window with vllm #1684

ishatalkin commented Feb 6, 2025

Unclear error in chat-ui when exceeding context window with vllm #1684

Unclear error in chat-ui when exceeding context window with vllm #1684

Comments

ishatalkin commented Feb 6, 2025

Bug description

Steps to reproduce

Screenshots

Context

Vllm response

Specs

Config

Notes