migtissera
commited on
Commit
•
67adc87
1
Parent(s):
b94a69a
Update README.md
Browse files
README.md
CHANGED
@@ -39,6 +39,7 @@ The system message *must* be the following:
|
|
39 |
|
40 |
|
41 |
# Evaluations
|
|
|
42 |
| | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
|
43 |
|--------------|------------------|------------------|-------------|
|
44 |
| GPQA | 41.5% | 41.6% | 40.2% |
|
|
|
39 |
|
40 |
|
41 |
# Evaluations
|
42 |
+
Since the model is trained to use test-time-compute, the evalutations were performed by first setting the system message, and then extracting the contents between the `<output>` `</output>` tags. Only the contents between the tags were then used for the evaluations.
|
43 |
| | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
|
44 |
|--------------|------------------|------------------|-------------|
|
45 |
| GPQA | 41.5% | 41.6% | 40.2% |
|