Add a test case for a single dimension evaluation #123

bugsz · 2024-06-24T19:54:11Z

📑 Description

I provide a test case for the issue mentions in #89.
Specifically this is done by adding a dummy evaluator with only one goal evaluation dimension, and add a new option for the response_format in evaluator.
Besides, I use the same format as in real Sotopia simulation in testing, which makes the test case aligned with the actual evaluation.

bugsz · 2024-06-24T19:57:01Z

By the way currently I am using assert False (so the current pytest is definitely not passing) to see the output. However, I do not know how to check if there is a reasoning part. Does anyone have an idea?

ProKil · 2024-06-24T20:07:09Z

@XuhuiZhou Could you help check this? I think this is basically a prompting issue? Maybe by changing the description of the goal dimension, it should work better?

XuhuiZhou · 2024-06-28T01:17:11Z

@bugsz @ProKil Okay I fixed this bug, basically, the original instruction is a bit ambiguous. But they somehow magically work when they stick together

codecov · 2024-06-28T01:18:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.01%. Comparing base (8d9b9be) to head (269b4f2).
Report is 2 commits behind head on main.

@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   60.03%   62.01%   +1.98%     
==========================================
  Files          47       55       +8     
  Lines        2402     2733     +331     
==========================================
+ Hits         1442     1695     +253     
- Misses        960     1038      +78

Files	Coverage Δ
sotopia/envs/evaluators.py	`91.07% <100.00%> (+0.62%)`	⬆️
tests/envs/test_evaluators.py	`100.00% <100.00%> (ø)`

... and 13 files with indirect coverage changes

sotopia/envs/evaluators.py

ProKil · 2024-06-28T19:20:07Z

@bugsz Could you check if this fixes your problem?

add a test case for a single dimension evaluation

ab57964

fix the single dimension bug

269b4f2

XuhuiZhou requested a review from ProKil June 28, 2024 01:21

ProKil requested changes Jun 28, 2024

View reviewed changes

sotopia/envs/evaluators.py Show resolved Hide resolved

ProKil assigned XuhuiZhou Jun 29, 2024

ProKil approved these changes Jul 2, 2024

View reviewed changes

ProKil merged commit 4559cfa into main Jul 2, 2024
8 checks passed

ProKil deleted the bug/evaluate_single_dimension branch July 2, 2024 15:11

bugsz mentioned this pull request Jul 2, 2024

Change the input type of the ReachGoalLLMEvaluator #129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a test case for a single dimension evaluation #123

Add a test case for a single dimension evaluation #123

bugsz commented Jun 24, 2024

bugsz commented Jun 24, 2024 •

edited

Loading

ProKil commented Jun 24, 2024

XuhuiZhou commented Jun 28, 2024 •

edited

Loading

codecov bot commented Jun 28, 2024 •

edited

Loading

ProKil commented Jun 28, 2024 •

edited

Loading

Add a test case for a single dimension evaluation #123

Add a test case for a single dimension evaluation #123

Conversation

bugsz commented Jun 24, 2024

📑 Description

bugsz commented Jun 24, 2024 • edited Loading

ProKil commented Jun 24, 2024

XuhuiZhou commented Jun 28, 2024 • edited Loading

codecov bot commented Jun 28, 2024 • edited Loading

Codecov Report

ProKil commented Jun 28, 2024 • edited Loading

bugsz commented Jun 24, 2024 •

edited

Loading

XuhuiZhou commented Jun 28, 2024 •

edited

Loading

codecov bot commented Jun 28, 2024 •

edited

Loading

ProKil commented Jun 28, 2024 •

edited

Loading