You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In ToolCallAccuracy, if the number of ToolCalls in the user_input is greater than the number of reference_tool_calls, it does not affect the evaluation score. In other words, the evaluation score remains unaffected even when more ToolCalls than expected occur.
Ragas version: latest
Python version: 3.12
Code to Reproduce
sample= [
HumanMessage(content="What's the weather like in New York right now?"),
AIMessage(content="The current temperature in New York is 75°F and it's partly cloudy.", tool_calls=[
ToolCall(name="weather_check", args={"location": "New York"})
]),
HumanMessage(content="Can you translate that to Celsius?"),
AIMessage(content="Let me convert that to Celsius for you.", tool_calls=[
ToolCall(name="temperature_conversion", args={"temperature_fahrenheit": 75})
]),
ToolMessage(content="75°F is approximately 23.9°C."),
AIMessage(content="75°F is approximately 23.9°C.")
]
sample=MultiTurnSample(
user_input=sample,
reference_tool_calls=[
ToolCall(name="weather_check", args={"location": "New York"})
]
)
Output:
1
Error trace
Expected behavior
"evaluation is 0"
I think there are many opinions.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
In your case, there were unnecessary tool call ”temperature_conversion”. So, I think the score should be low because agent did unexpected behavior.
I was a bit confused, but I'm gradually understanding.
In this case, the temperature_conversion Tool Call was not executed.
Therefore, even if the 'weather_check' was executed correctly including args, the score is 0 in this metrics,
Is my understanding correct?
Describe the bug
In ToolCallAccuracy, if the number of ToolCalls in the user_input is greater than the number of reference_tool_calls, it does not affect the evaluation score. In other words, the evaluation score remains unaffected even when more ToolCalls than expected occur.
Ragas version: latest
Python version: 3.12
Code to Reproduce
Output:
Error trace
Expected behavior
"evaluation is 0"
I think there are many opinions.
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: