diff --git a/evaluation_results/2024-12-21_13-57-31/results.md b/evaluation_results/2024-12-21_13-57-31/results.md
index 3be0a9c..1923c0d 100644
--- a/evaluation_results/2024-12-21_13-57-31/results.md
+++ b/evaluation_results/2024-12-21_13-57-31/results.md
@@ -4,9 +4,7 @@ There are 4 scenarios and 4 test cases with 3 attempts (48 total tests).
## Test: blank_math
### claude_sonnet_latest_with_seg
-
-
-
+
```
10
@@ -15,36 +13,22 @@ There are 4 scenarios and 4 test cases with 3 attempts (48 total tests).
### gpt-4o-mini_no_seg
-
-
-
-
-
+
### gpt-4o_with_seg
-
-
-
-
-
+
```
10
```
### claude_sonnet_latest_no_seg
-
-
-
-
-
+
## Test: tic_tac_toe_1
### claude_sonnet_latest_with_seg
-
-
-
+
```
Your turn! Place an O anywhere you'd like.
@@ -53,83 +37,39 @@ Your turn! Place an O anywhere you'd like.
### gpt-4o-mini_no_seg
-
-
-
-
-
+
### gpt-4o_with_seg
-
-
-
-
-
+
### claude_sonnet_latest_no_seg
-
-
-
-
-
+
## Test: x_in_box
### claude_sonnet_latest_with_seg
-
-
-
-
-
+
### gpt-4o-mini_no_seg
-
-
-
-
-
+
### gpt-4o_with_seg
-
-
-
-
-
+
### claude_sonnet_latest_no_seg
-
-
-
-
-
+
## Test: x_in_boxes
### claude_sonnet_latest_with_seg
-
-
-
-
-
+
### gpt-4o-mini_no_seg
-
-
-
-
-
+
### gpt-4o_with_seg
-
-
-
-
-
+
### claude_sonnet_latest_no_seg
-
-
-
-
-
+