migtissera commited on
Commit
4936c0b
1 Parent(s): e3350e4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -31,7 +31,6 @@ The model was trained mostly with Chain-of-Thought reasoning data, including the
31
 
32
 
33
  # Evaluations
34
- The below evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
35
  | | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
36
  |--------------|------------------|------------------|-------------|
37
  | GPQA | 41.5% | 41.6% | 40.2% |
@@ -40,6 +39,13 @@ The below evaluations were performed using a fork of Glaive's `simple-evals` cod
40
  | MMLU-Pro | 65.6% | 65.0% | - |
41
  | HumanEval | | 88.1% | 87.2% |
42
 
 
 
 
 
 
 
 
43
  # Prompt Format
44
  The model uses Llama3 prompt format.
45
 
 
31
 
32
 
33
  # Evaluations
 
34
  | | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
35
  |--------------|------------------|------------------|-------------|
36
  | GPQA | 41.5% | 41.6% | 40.2% |
 
39
  | MMLU-Pro | 65.6% | 65.0% | - |
40
  | HumanEval | | 88.1% | 87.2% |
41
 
42
+ The evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
43
+
44
+ Example to run evaluations:
45
+ `python run_reflection_eval.py tess_r1_70b --evals gpqa mmlu math`
46
+
47
+ The system message have been edited in the sampler to reflect Tess-R1's system prompt.
48
+
49
  # Prompt Format
50
  The model uses Llama3 prompt format.
51