You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The chat-ui application encounters an unclear error when the context window is exceeded while working with the vllm. The user sees "400 status code (no body)" error. The expected error is the one, which vllm answers with: "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion"
Steps to reproduce
Setup chat-ui to work with vllm. Run vllm with small context window for simplicity of testing (--max-model-len 100)
Input a large amount of text or continue the conversation, causing the context window to exceed its limit.
Observe the error message in the UI.
Screenshots
Context
Vllm response
{
"object": "error",
"message": "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion.",
"type": "BadRequestError",
"param": null,
"code": 400
}
Specs
OS: Ubuntu 24.04.1 LTS
Browser: Chrome
chat-ui commit: version 0.9.4
Config
MODELS=`[
{
"name": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
"tokenizer": "Qwen/Qwen2.5-14B-Instruct-GPTQ-Int4",
"description": "The latest Qwen open model with improved role-playing, long text generation and structured data understanding.",
"modelUrl": "https://huggingface.co/Qwen/Qwen2.5-14B-Instruct",
"websiteUrl": "https://qwenlm.github.io/blog/qwen2.5/",
"logoUrl": "https://huggingface.co/datasets/huggingchat/models-logo/resolve/main/qwen-logo.png",
"preprompt": "You are Qwen, created by Alibaba Cloud. You are a helpful assistant.",
"chatPromptTemplate": "<s>{{preprompt}}{{#each messages}}{{#ifUser}}<|user|>\n{{content}}<|end|>\n<|assistant|>\n{{/ifUser}}{{#ifAssistant}}{{content}}<|end|>\n{{/ifAssistant}}{{/each}}",
"parameters": {
"stop": ["<|endoftext|>", "<|im_end|>"],
"temperature": 0.6,
"max_new_tokens": 3072
},
"endpoints": [{
"type" : "openai",
"baseURL": "http://127.0.0.1:8000/v1"
}],
},
]`
Notes
It seems that the problem is in openai package
File core.mjs, method makeRequest, lines 306:308:
consterrText=awaitresponse.text().catch((e)=>castToError(e).message);consterrJSON=safeJSON(errText);consterrMessage=errJSON ? undefined : errText;// JSON is just ignored... Why?
In the last line json answer is ignored, which causes empty body in APIError.makeMessage:
staticmakeMessage(status,error,message){constmsg=error?.message ?
typeoferror.message==='string' ?
error.message
: JSON.stringify(error.message)
: error ? JSON.stringify(error)
: message;if(status&&msg){return`${status}${msg}`;}if(status){return`${status} status code (no body)`;// this error is returned, because msg is undefined}if(msg){returnmsg;}return'(no status code or body)';}
The text was updated successfully, but these errors were encountered:
Bug description
The chat-ui application encounters an unclear error when the context window is exceeded while working with the vllm. The user sees "400 status code (no body)" error. The expected error is the one, which vllm answers with: "This model's maximum context length is 2048 tokens. However, you requested 2723 tokens (1699 in the messages, 1024 in the completion). Please reduce the length of the messages or completion"
Steps to reproduce
--max-model-len 100
)Screenshots
Context
Vllm response
Specs
Config
Notes
It seems that the problem is in
openai
packageFile core.mjs, method makeRequest, lines 306:308:
In the last line json answer is ignored, which causes empty body in
APIError.makeMessage
:The text was updated successfully, but these errors were encountered: