fix: Get builtin tool calling working in remote-vllm #1236

bbrowning · 2025-02-24T20:47:34Z

What does this PR do?

This PR makes a couple of changes required to get the test tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search passing on the remote-vllm provider.

First, we adjust agent_instance to also pass in the description and parameters of builtin tools. We need these parameters so we can pass the tool's expected parameters into vLLM. The meta-reference implementations may not have needed these for builtin tools, as they are able to take advantage of the Llama-model specific support for certain builtin tools. However, with vLLM, our server-side chat templates for tool calling treat all tools the same and don't separate out Llama builtin vs custom tools. So, we need to pass the full set of parameter definitions and list of required parameters for builtin tools as well.

Next, we adjust the vllm streaming chat completion code to fix up some edge cases where it was returning an extra ChatCompletionResponseEvent with an empty ToolCall with empty string call_id, tool_name, and arguments properties. This is a bug discovered after the above fix, where after a successful tool invocation we were sending extra chunks back to the client with these empty ToolCalls.

Test Plan

With these changes, the following test that previously failed now passes:

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \
--inference-model "meta-llama/Llama-3.2-3B-Instruct"

Additionally, I ran the remote-vllm client-sdk and provider inference tests as below to ensure they all still passed with this change:

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v \
tests/client-sdk/inference/test_text_inference.py \
--inference-model "meta-llama/Llama-3.2-3B-Instruct"

VLLM_URL="http://localhost:8000/v1" \
python -m pytest -s -v \
llama_stack/providers/tests/inference/test_text_inference.py \
--providers "inference=vllm_remote"

This PR makes a couple of changes required to get the test `tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search` passing on the remote-vllm provider. First, we adjust agent_instance to also pass in the description and parameters of builtin tools. We need these parameters so we can pass the tool's expected parameters into vLLM. The meta-reference implementations may not have needed these for builtin tools, as they are able to take advantage of the Llama-model specific support for certain builtin tools. However, with vLLM, our server-side chat templates for tool calling treat all tools the same and don't separate out Llama builtin vs custom tools. So, we need to pass the full set of parameter definitions and list of required parameters for builtin tools as well. Next, we adjust the streaming chat completion code to fix up some edge cases where it was returning an extra ChatCompletionResponseEvent with an empty ToolCall with empty string call_id, tool_name, and arguments properties. This is a bug discovered after the above fix, where after a successful tool invocation we were sending extra chunks back to the client with these empty ToolCalls. With these changes, the following test that previously failed now passes: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` Additionally, I ran the remote-vllm client-sdk and provider inference tests as below to ensure they all still passed with this change: ``` VLLM_URL="http://localhost:8000/v1" \ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \ LLAMA_STACK_CONFIG=remote-vllm \ python -m pytest -v \ tests/client-sdk/inference/test_text_inference.py \ --inference-model "meta-llama/Llama-3.2-3B-Instruct" ``` ``` VLLM_URL="http://localhost:8000/v1" \ python -m pytest -s -v \ llama_stack/providers/tests/inference/test_text_inference.py \ --providers "inference=vllm_remote" ``` Signed-off-by: Ben Browning <[email protected]>

terrytangyuan · 2025-02-24T21:18:08Z

llama_stack/providers/inline/agents/meta_reference/agent_instance.py

@@ -905,7 +905,19 @@ async def _get_tool_defs(
                    if tool_def_map.get(built_in_type, None):
                        raise ValueError(f"Tool {built_in_type} already exists")

-                    tool_def_map[built_in_type] = ToolDefinition(tool_name=built_in_type)
+                    tool_def_map[built_in_type] = ToolDefinition(


@ashwinb @yanxi0830 Could you help double check on this change?

I'm not sure if having additional fields here would break things (seems not?), but if not, looks like we can just set the key value here for tool_def_map then L926 below should take care fo this.

bbrowning requested review from ashwinb, yanxi0830, hardikjshah, dltn, raghotham, dineshyv, vladimirivic, sixianyi0721, ehhuang and terrytangyuan as code owners February 24, 2025 20:47

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 24, 2025

terrytangyuan reviewed Feb 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Get builtin tool calling working in remote-vllm #1236

fix: Get builtin tool calling working in remote-vllm #1236

bbrowning commented Feb 24, 2025

terrytangyuan Feb 24, 2025

ehhuang Feb 25, 2025

fix: Get builtin tool calling working in remote-vllm #1236

Are you sure you want to change the base?

fix: Get builtin tool calling working in remote-vllm #1236

Conversation

bbrowning commented Feb 24, 2025

What does this PR do?

Test Plan

terrytangyuan Feb 24, 2025

Choose a reason for hiding this comment

ehhuang Feb 25, 2025

Choose a reason for hiding this comment