Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

shanzhou2186 · 2025-02-18T07:51:17Z

When running vllm serving with 16 threads using the model DeepSeek-Distill-Qwen-7b, the result is wrong with the prompt below.
xfastertransformer 1.8.2.
vllm-xft 0.5.5.0

The result is correct while running 12 threads (OMP_NUM_THREADS=12).

The prompt and error message:

curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-qwen-7b-xft",
"messages": [{"role": "user", "content": "请帮我用 HTML 生成一个五子棋游戏，所有代码都保存在一个 HTML 中。"}],
"max_tokens": 256,
"temperature": 0.7
}'
{"id":"chat-9dc50d6d9c8b499f9b4e13c0f9cd7644","object":"chat.completion","created":1739864206,"model":"deepseek-qwen-7b-xft","choices":[{"index":0,"message":{"role":"assistant","content":"!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!","tool_calls":[]},"logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":{"prompt_tokens":23,"total_tokens":279,"completion_tokens":256},"prompt_logprobs":null}(base)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

shanzhou2186 commented Feb 18, 2025

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

Got a wrong result with DeepSeek-Distill-Qwen-7b while running vllm serving with OMP_NUM_THREADS=16 #491

Comments

shanzhou2186 commented Feb 18, 2025