migtissera
commited on
Commit
•
0523050
1
Parent(s):
fddc2ac
Update README.md
Browse files
README.md
CHANGED
@@ -12,6 +12,17 @@ model-index:
|
|
12 |
|
13 |
<br>
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
|
17 |
|
|
|
12 |
|
13 |
<br>
|
14 |
|
15 |
+
# Evaluations
|
16 |
+
|
17 |
+
| | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
|
18 |
+
|--------------|------------------|------------------|-------------|
|
19 |
+
| GPQA | 41.5% | 41.6% | 40.2% |
|
20 |
+
| MMLU | 81.6% | - | 82.0% |
|
21 |
+
| MATH | 64.2% | 69.4% | 70.2% |
|
22 |
+
| MMLU-Pro | 65.6% | 65.0% | - |
|
23 |
+
| HumanEval | | 88.1% | 87.2% |
|
24 |
+
| DROP (F1 Score) | | 83.1% | 79.7% |
|
25 |
+
|
26 |
|
27 |
Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
|
28 |
|