Nondeterministic Agent Tests #1182

PaulMDell · 2025-02-20T20:06:05Z

System Info

PyTorch version: 2.5.1+cu124
CUDA used to build PyTorch: 12.4
2xNVIDIA A40 GPUs

Information

The official example scripts
My own modified scripts

🐛 Describe the bug

When running PyTest agent tests (pytest -s -v tests/client-sdk/agents) in version 0.1.3 of Llama Stack, some of the tests are nondeterministic and return different results each test run. The tests I've noticed that have this issue are:

test_custom_tool
test_builtin_tool_code_execution
test_code_interpreter_for_attachments
test_custom_tool

The model server is ghcr.io/huggingface/text-generation-inference:2.3.1 hosting the meta-llama/Llama-3.1-8B-Instruct model and this is using the llamastack/distribution-tgi:0.1.3 container.

Error logs

Below are 3 separate runs of the agent tests to show the changing results.

Test 1:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED

Test 2:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED

Test 3:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED

Expected behavior

Tests should deterministically return the same result no matter how many times they are executed.

terrytangyuan · 2025-02-21T05:07:53Z

Do you have details on what caused those failed tests? Any logs would be helpful here. I previously observed an issue where tool calls are not executed #1147 in these tests.

PaulMDell added the bug Something isn't working label Feb 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nondeterministic Agent Tests #1182

Nondeterministic Agent Tests #1182

PaulMDell commented Feb 20, 2025

terrytangyuan commented Feb 21, 2025

Nondeterministic Agent Tests #1182

Nondeterministic Agent Tests #1182

Comments

PaulMDell commented Feb 20, 2025

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

terrytangyuan commented Feb 21, 2025