Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nondeterministic Agent Tests #1182

Open
1 of 2 tasks
PaulMDell opened this issue Feb 20, 2025 · 1 comment
Open
1 of 2 tasks

Nondeterministic Agent Tests #1182

PaulMDell opened this issue Feb 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@PaulMDell
Copy link

System Info

PyTorch version: 2.5.1+cu124
CUDA used to build PyTorch: 12.4
2xNVIDIA A40 GPUs

Information

  • The official example scripts
  • My own modified scripts

🐛 Describe the bug

When running PyTest agent tests (pytest -s -v tests/client-sdk/agents) in version 0.1.3 of Llama Stack, some of the tests are nondeterministic and return different results each test run. The tests I've noticed that have this issue are:

  • test_custom_tool
  • test_builtin_tool_code_execution
  • test_code_interpreter_for_attachments
  • test_custom_tool

The model server is ghcr.io/huggingface/text-generation-inference:2.3.1 hosting the meta-llama/Llama-3.1-8B-Instruct model and this is using the llamastack/distribution-tgi:0.1.3 container.

Error logs

Below are 3 separate runs of the agent tests to show the changing results.

Test 1:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED

Test 2:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED

Test 3:
tests/client-sdk/agents/test_agents.py::test_tool_config[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_web_search[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_builtin_tool_code_execution[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_code_interpreter_for_attachments[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_custom_tool[meta-llama/Llama-3.1-8B-Instruct] PASSED
tests/client-sdk/agents/test_agents.py::test_tool_choice[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_rag_and_code_agent[meta-llama/Llama-3.1-8B-Instruct] FAILED
tests/client-sdk/agents/test_agents.py::test_create_turn_response[meta-llama/Llama-3.1-8B-Instruct] FAILED

Expected behavior

Tests should deterministically return the same result no matter how many times they are executed.

@PaulMDell PaulMDell added the bug Something isn't working label Feb 20, 2025
@terrytangyuan
Copy link
Collaborator

Do you have details on what caused those failed tests? Any logs would be helpful here. I previously observed an issue where tool calls are not executed #1147 in these tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants