You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch version: 2.5.1+cu124
CUDA used to build PyTorch: 12.4
2xNVIDIA A40 GPUs
Information
The official example scripts
My own modified scripts
🐛 Describe the bug
When running PyTest agent tests (pytest -s -v tests/client-sdk/agents) in version 0.1.3 of Llama Stack, some of the tests are nondeterministic and return different results each test run. The tests I've noticed that have this issue are:
test_custom_tool
test_builtin_tool_code_execution
test_code_interpreter_for_attachments
test_custom_tool
The model server is ghcr.io/huggingface/text-generation-inference:2.3.1 hosting the meta-llama/Llama-3.1-8B-Instruct model and this is using the llamastack/distribution-tgi:0.1.3 container.
Error logs
Below are 3 separate runs of the agent tests to show the changing results.
Do you have details on what caused those failed tests? Any logs would be helpful here. I previously observed an issue where tool calls are not executed #1147 in these tests.
System Info
PyTorch version: 2.5.1+cu124
CUDA used to build PyTorch: 12.4
2xNVIDIA A40 GPUs
Information
🐛 Describe the bug
When running PyTest agent tests (pytest -s -v tests/client-sdk/agents) in version 0.1.3 of Llama Stack, some of the tests are nondeterministic and return different results each test run. The tests I've noticed that have this issue are:
The model server is ghcr.io/huggingface/text-generation-inference:2.3.1 hosting the meta-llama/Llama-3.1-8B-Instruct model and this is using the llamastack/distribution-tgi:0.1.3 container.
Error logs
Below are 3 separate runs of the agent tests to show the changing results.
Test 1:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED
Test 2:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED
Test 3:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED
Expected behavior
Tests should deterministically return the same result no matter how many times they are executed.
The text was updated successfully, but these errors were encountered: