Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a test case for a single dimension evaluation #123

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

bugsz
Copy link
Contributor

@bugsz bugsz commented Jun 24, 2024

📑 Description

I provide a test case for the issue mentions in #89.
Specifically this is done by adding a dummy evaluator with only one goal evaluation dimension, and add a new option for the response_format in evaluator.
Besides, I use the same format as in real Sotopia simulation in testing, which makes the test case aligned with the actual evaluation.

@bugsz
Copy link
Contributor Author

bugsz commented Jun 24, 2024

By the way currently I am using assert False (so the current pytest is definitely not passing) to see the output. However, I do not know how to check if there is a reasoning part. Does anyone have an idea?

@ProKil
Copy link
Member

ProKil commented Jun 24, 2024

@XuhuiZhou Could you help check this? I think this is basically a prompting issue? Maybe by changing the description of the goal dimension, it should work better?

@XuhuiZhou
Copy link
Member

XuhuiZhou commented Jun 28, 2024

@bugsz @ProKil Okay I fixed this bug, basically, the original instruction is a bit ambiguous. But they somehow magically work when they stick together

Copy link

codecov bot commented Jun 28, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.01%. Comparing base (8d9b9be) to head (269b4f2).
Report is 2 commits behind head on main.

@@            Coverage Diff             @@
##             main     #123      +/-   ##
==========================================
+ Coverage   60.03%   62.01%   +1.98%     
==========================================
  Files          47       55       +8     
  Lines        2402     2733     +331     
==========================================
+ Hits         1442     1695     +253     
- Misses        960     1038      +78     
Files Coverage Δ
sotopia/envs/evaluators.py 91.07% <100.00%> (+0.62%) ⬆️
tests/envs/test_evaluators.py 100.00% <100.00%> (ø)

... and 13 files with indirect coverage changes

@XuhuiZhou XuhuiZhou requested a review from ProKil June 28, 2024 01:21
sotopia/envs/evaluators.py Show resolved Hide resolved
@ProKil
Copy link
Member

ProKil commented Jun 28, 2024

@bugsz Could you check if this fixes your problem?

@ProKil ProKil merged commit 4559cfa into main Jul 2, 2024
8 checks passed
@ProKil ProKil deleted the bug/evaluate_single_dimension branch July 2, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants