Inference Pytest Failed at text_chat_completion_with_tool_calling scenarios #1185

dawenxi-007 · 2025-02-20T20:56:27Z

System Info

CUDA: 12.4 / Driver: 550.120 / 1xH100

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

Pytest for host running vLLM and PGvector. All the tool-calling cases failed but the others passed. The error message includes SQLite write error, but it may not be related to the issue as the other cases passed with this error. Want to understand the root cause.

Command:

LLAMA_STACK_CONFIG="/home/tao/llamastk_vllm/vllm-run.yaml" pytest -s -v tests/client-sdk/inference/test_text_inference.py

The following is the yaml file:

version: '2'
image_name: dell-llamastk
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
providers:
  inference:
  - provider_id: vllm0
    provider_type: remote::vllm
    config:
      url: ${env.VLLM_URL}/v1
  - provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
    config: {}
  vector_io:
  - provider_id: pgvector
    provider_type: remote::pgvector
    config:
     host: ${env.MACHINE_IP}
     port: 5432
     db: postgres
     user: postgres
     password: mysecretpassword
  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config: {}
  - provider_id: code-scanner
    provider_type: inline::code-scanner
    config: {}
  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence_store:
        type: sqlite
        namespace: null
        db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/meta-reference-gpu}/agents_store.db
  telemetry:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      service_name: ${env.OTEL_SERVICE_NAME:llama-stack}
      sinks: ${env.TELEMETRY_SINKS:console,sqlite}
      sqlite_db_path: ${env.SQLITE_DB_PATH:~/.llama/distributions/meta-reference-gpu/trace_store.db}
  eval:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config: {}
  datasetio:
  - provider_id: huggingface
    provider_type: remote::huggingface
    config: {}
  - provider_id: localfs
    provider_type: inline::localfs
    config: {}
  scoring:
  - provider_id: basic
    provider_type: inline::basic
    config: {}
  - provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
    config: {}
  - provider_id: braintrust
    provider_type: inline::braintrust
    config:
      openai_api_key: ${env.OPENAI_API_KEY:}
  tool_runtime:
  - provider_id: brave-search
    provider_type: remote::brave-search
    config:
      api_key: ${env.BRAVE_SEARCH_API_KEY:}
      max_results: 3
  - provider_id: tavily-search
    provider_type: remote::tavily-search
    config:
      api_key: ${env.TAVILY_SEARCH_API_KEY:}
      max_results: 3
  - provider_id: code-interpreter
    provider_type: inline::code-interpreter
    config: {}
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime
    config: {}
  - provider_id: model-context-protocol
    provider_type: remote::model-context-protocol
    config: {}
metadata_store:
  type: sqlite
  db_path: ${env.SQLITE_STORE_DIR:~/.llama/distributions/meta-reference-gpu}/registry.db
models:
- metadata:
    llama_model: ${env.LLAMA_MODEL}
  model_id: ${env.LLAMA_MODEL}
  provider_id: vllm0
  provider_model_id: ${env.INFERENCE_MODEL}
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  provider_id: sentence-transformers
  model_type: embedding
shields: []
vector_dbs: []
datasets: []
scoring_fns: []
eval_tasks: []
tool_groups:
- toolgroup_id: builtin::websearch
  provider_id: tavily-search
- toolgroup_id: builtin::rag
  provider_id: rag-runtime
- toolgroup_id: builtin::code_interpreter
  provider_id: code-interpreter

Error logs

tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] Error expo
rting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
FAILED
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] Error exportin
g span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
FAILEDError exporting span to SQLite: attempt to write a readonly database

tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] Error exporting span to SQLi
te: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
FAILED
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True] Error e
xporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
FAILED
tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False] Error
exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database
Error exporting span to SQLite: attempt to write a readonly database

FAILED                                                                                                                                             [187/1913]

========================================================================= FAILURES ==========================================================================
______________________________ test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] ______________________________
tests/client-sdk/inference/test_text_inference.py:198: in test_text_chat_completion_with_tool_calling_and_non_streaming
    response = llama_stack_client.inference.chat_completion(
../env/lib/python3.10/site-packages/llama_stack_client/_utils/_utils.py:275: in wrapper
    return func(*args, **kwargs)
../env/lib/python3.10/site-packages/llama_stack_client/resources/inference.py:290: in chat_completion
    return self._post(
../env/lib/python3.10/site-packages/llama_stack_client/_base_client.py:1273: in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
../env/lib/python3.10/site-packages/llama_stack/distribution/library_client.py:168: in request
    return asyncio.run(self.async_client.request(*args, **kwargs))
/usr/lib/python3.10/asyncio/runners.py:44: in run
    return loop.run_until_complete(main)
/usr/lib/python3.10/asyncio/base_events.py:649: in run_until_complete
    return future.result()
../env/lib/python3.10/site-packages/llama_stack/distribution/library_client.py:277: in request
    response = await self._call_non_streaming(
../env/lib/python3.10/site-packages/llama_stack/distribution/library_client.py:324: in _call_non_streaming
...
../env/lib/python3.10/site-packages/openai/_base_client.py:967: in request
    return self._request(
../env/lib/python3.10/site-packages/openai/_base_client.py:1071: in _request
    raise self._make_status_error_from_response(err.response) from None
E   openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to
 be set', 'type': 'BadRequestError', 'param': None, 'code': 400}
--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
INFO     httpx:_client.py:1025 HTTP Request: POST http://100.67.149.82/v1/chat/completions "HTTP/1.1 400 Bad Request"
________________________________ test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] ________________________________
tests/client-sdk/inference/test_text_inference.py:246: in test_text_chat_completion_with_tool_calling_and_streaming
    tool_invocation_content = extract_tool_invocation_content(response)
tests/client-sdk/inference/test_text_inference.py:224: in extract_tool_invocation_content
    for chunk in response:
../env/lib/python3.10/site-packages/llama_stack/distribution/library_client.py:159: in sync_generator
    chunk = loop.run_until_complete(async_stream.__anext__())
/usr/lib/python3.10/asyncio/base_events.py:649: in run_until_complete
    return future.result()
../env/lib/python3.10/site-packages/llama_stack_client/_streaming.py:105: in __anext__
    return await self._iterator.__anext__()
...
E   openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to
 be set', 'type': 'BadRequestError', 'param': None, 'code': 400}
--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
INFO     httpx:_client.py:1025 HTTP Request: POST http://100.67.149.82/v1/chat/completions "HTTP/1.1 400 Bad Request"
_______________________________________ test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] _______________________________________
tests/client-sdk/inference/test_text_inference.py:275: in test_text_chat_completion_structured_output
    answer = AnswerFormat.model_validate_json(response.completion_message.content)
E   pydantic_core._pydantic_core.ValidationError: 1 validation error for AnswerFormat
E     Invalid JSON: EOF while parsing an object at line 8191 column 0 [type=json_invalid, input_value='{   \n\n\n\n\n\n   \n\n\...\n\n\n\n   \n\n\n\n\n\n', i
nput_type=str]
E       For further information visit https://errors.pydantic.dev/2.10/v/json_invalid
--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------

--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
INFO     httpx:_client.py:1025 HTTP Request: POST http://100.67.149.82/v1/chat/completions "HTTP/1.1 200 OK"
____________________________ test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True] _____________________________
tests/client-sdk/inference/test_text_inference.py:348: in test_text_chat_completion_tool_calling_tools_not_in_request

E   openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to
 be set', 'type': 'BadRequestError', 'param': None, 'code': 400}
--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
INFO     httpx:_client.py:1025 HTTP Request: POST http://100.67.149.82/v1/chat/completions "HTTP/1.1 400 Bad Request"
____________________________ test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False] ____________________________
tests/client-sdk/inference/test_text_inference.py:345: in test_text_chat_completion_tool_calling_tools_not_in_request
    response = llama_stack_client.inference.chat_completion(**request)
../env/lib/python3.10/site-packages/llama_stack_client/_utils/_utils.py:275: in wrapper
    return func(*args, **kwargs)
../env/lib/python3.10/site-packages/llama_stack_client/resources/inference.py:290: in chat_completion
...
    raise self._make_status_error_from_response(err.response) from None
E   openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to
 be set', 'type': 'BadRequestError', 'param': None, 'code': 400}
--------------------------------------------------------------------- Captured log call ---------------------------------------------------------------------
INFO     httpx:_client.py:1025 HTTP Request: POST http://100.67.149.82/v1/chat/completions "HTTP/1.1 400 Bad Request"
================================================================== short test summary info ==================================================================
FAILED tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_non_streaming[meta-llama/Llama-3.1-8B-Instruct] - o
penai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser t...
FAILED tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_with_tool_calling_and_streaming[meta-llama/Llama-3.1-8B-Instruct] - opena
i.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser t...
FAILED tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_structured_output[meta-llama/Llama-3.1-8B-Instruct] - pydantic_core._pyda
ntic_core.ValidationError: 1 validation error for AnswerFormat
FAILED tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-True]
- openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser t...
FAILED tests/client-sdk/inference/test_text_inference.py::test_text_chat_completion_tool_calling_tools_not_in_request[meta-llama/Llama-3.1-8B-Instruct-False]
 - openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser t..
.
========================================================= 5 failed, 9 passed, 2 warnings in 51.24s ==========================================================

Expected behavior

All test cases pass for text inferencing.

The text was updated successfully, but these errors were encountered:

dawenxi-007 · 2025-02-20T21:01:10Z

Besides, it would be good to add the description of the package dependencies. For me, to be able to run the test, I had to install the following packages:

aiosqlite 
openai 
psycopg2-binary 
chardet 
pypdf 
mcp 
opentelemetry-api 
opentelemetry-sdk 
opentelemetry-exporter-otlp 
autoevals 
sentence_transformers

terrytangyuan · 2025-02-20T21:34:51Z

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to
 be set', 'type': 'BadRequestError', 'param': None, 'code': 400}

Your vLLM server has not enabled tool call.

dawenxi-007 · 2025-02-20T22:19:52Z

Thanks for the quick response!

What is the command to enable the tool call in vLLM through Docker? I didn't see it described in Llama Stack documentation. In the vLLM documentation here, they only show the vLLM command level arguments. While I tried to add the same arguments into the docker command:

        docker run -d --rm \
            --runtime nvidia \
            --shm-size 1g \
            -p $INFERENCE_PORT:$INFERENCE_PORT \
            --gpus all \
            -v ~/.cache/huggingface:/root/.cache/huggingface \
            --env "HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN" \
            --ipc=host vllm/vllm-openai:latest \
            --gpu-memory-utilization 0.9 \
            --model $INFERENCE_MODEL \
            --enable-auto-tool-choice \
            --tool-call-parser llama3_json \
            --chat-template vllm/examples/tool_chat_template_llama3.1_json.jinja \
            --tensor-parallel-size 1 \
            --port 80

I got the following error:

ValueError: The supplied chat template string (vllm/examples/tool_chat_template_llama3.1_json.jinja) appears path-like, but doesn't exist!

I do have the jinja file at the location.

How about other inferencing engine like TGI? Do we need purposely enable the tool calling capability? Pytorch flow does not seem to need it.

dawenxi-007 · 2025-02-20T23:44:37Z

Tested that TGI does not have this issue. There is some information like Error exporting span to SQLite: attempt to write a readonly database but are not accounted as errors in the report. I am setting the pgvector as the vector io, I am not sure where this message comes from.

dawenxi-007 added the bug Something isn't working label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Pytest Failed at text_chat_completion_with_tool_calling scenarios #1185

Inference Pytest Failed at text_chat_completion_with_tool_calling scenarios #1185

dawenxi-007 commented Feb 20, 2025

dawenxi-007 commented Feb 20, 2025

terrytangyuan commented Feb 20, 2025

dawenxi-007 commented Feb 20, 2025

dawenxi-007 commented Feb 20, 2025

Inference Pytest Failed at text_chat_completion_with_tool_calling scenarios #1185

Inference Pytest Failed at text_chat_completion_with_tool_calling scenarios #1185

Comments

dawenxi-007 commented Feb 20, 2025

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

dawenxi-007 commented Feb 20, 2025

terrytangyuan commented Feb 20, 2025

dawenxi-007 commented Feb 20, 2025

dawenxi-007 commented Feb 20, 2025