migtissera
/

Tess-R1-Limerick-Llama-3.1-70B

Model card Files Files and versions Community

migtissera commited on 10 days ago

Commit

4936c0b

•

1 Parent(s): e3350e4

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -31,7 +31,6 @@ The model was trained mostly with Chain-of-Thought reasoning data, including the
 # Evaluations
-The below evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
 |              | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
 |--------------|------------------|------------------|-------------|
 | GPQA         | 41.5%            | 41.6%           | 40.2%       |
@@ -40,6 +39,13 @@ The below evaluations were performed using a fork of Glaive's `simple-evals` cod
 | MMLU-Pro     | 65.6%            | 65.0%           | -           |
 | HumanEval    |             | 88.1%           | 87.2%       |
 # Prompt Format
 The model uses Llama3 prompt format.

 # Evaluations
 |              | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
 |--------------|------------------|------------------|-------------|
 | GPQA         | 41.5%            | 41.6%           | 40.2%       |
 | MMLU-Pro     | 65.6%            | 65.0%           | -           |
 | HumanEval    |             | 88.1%           | 87.2%       |
+The evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
+Example to run evaluations:
+`python run_reflection_eval.py tess_r1_70b --evals gpqa mmlu math`
+The system message have been edited in the sampler to reflect Tess-R1's system prompt.
 # Prompt Format
 The model uses Llama3 prompt format.