migtissera commited on
Commit
e3350e4
1 Parent(s): 64e8b20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -11
README.md CHANGED
@@ -12,7 +12,6 @@ model-index:
12
 
13
 
14
  # Introduction
15
-
16
  Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
17
 
18
  The model is trained to first think step-by-step, and contemplate on its answers. It can also write alternatives after contemplating. Once all the steps have been thought through, it writes the final output.
@@ -22,11 +21,17 @@ The model is trained to first think step-by-step, and contemplate on its answers
22
  3. `<alternatively>` `</alternatively>` tags are used for alternate suggestions.
23
  4. Finally, `<output>` `</output>` tags are used for the final output
24
 
25
- # Important Note:
26
  In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
27
 
28
- # Evaluations
 
 
 
29
 
 
 
 
30
  | | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
31
  |--------------|------------------|------------------|-------------|
32
  | GPQA | 41.5% | 41.6% | 40.2% |
@@ -36,23 +41,16 @@ In a multi-turn conversation, only the contents between the `<output>` `</output
36
  | HumanEval | | 88.1% | 87.2% |
37
 
38
  # Prompt Format
39
-
40
  The model uses Llama3 prompt format.
41
 
42
  # System Message
43
-
44
  The system message *must* be the following:
45
 
46
  ```You are Tess-R1, an advanced AI that was created for complex reasoning. Given a user query, you are able to first create a Chain-of-Thought (CoT) reasoning. Once the CoT is devised, you then proceed to first think about how to answer. While doing this, you have the capability to contemplate on the thought, and also provide alternatives. Once the CoT steps have been thought through, you then respond by creating the final output.```
47
 
48
  # Inference
49
 
50
- The model was trained mostly with Chain-of-Thought reasoning data, including the XML tags. However, to generalize model generations, some single-turn and multi-turn data without XML tags were also included. Due to this, in some instances the model does not produce XML tags and does not fully utilize test-time compute capabilities. There is two ways to get around this:
51
-
52
- - Include a try/catch statement in your inference script, and only pass on the contents between the `<output>` `</output>` tags if it's available.
53
- - Use the `<thinking>` tag as the seed in the generation, and force the model to produce outputs with XML tags. i.e: `f"{conversation}{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n<thinking>"`
54
-
55
- I have included a sample Python script below.
56
 
57
  ```python
58
  import torch, json
 
12
 
13
 
14
  # Introduction
 
15
  Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
16
 
17
  The model is trained to first think step-by-step, and contemplate on its answers. It can also write alternatives after contemplating. Once all the steps have been thought through, it writes the final output.
 
21
  3. `<alternatively>` `</alternatively>` tags are used for alternate suggestions.
22
  4. Finally, `<output>` `</output>` tags are used for the final output
23
 
24
+ ## Important Note:
25
  In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
26
 
27
+ The model was trained mostly with Chain-of-Thought reasoning data, including the XML tags. However, to generalize model generations, some single-turn and multi-turn data without XML tags were also included. Due to this, in some instances the model does not produce XML tags and does not fully utilize test-time compute capabilities. There is two ways to get around this:
28
+
29
+ - Include a try/catch statement in your inference script, and only pass on the contents between the `<output>` `</output>` tags if it's available.
30
+ - Use the `<thinking>` tag as the seed in the generation, and force the model to produce outputs with XML tags. i.e: `f"{conversation}{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n<thinking>"`
31
 
32
+
33
+ # Evaluations
34
+ The below evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
35
  | | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
36
  |--------------|------------------|------------------|-------------|
37
  | GPQA | 41.5% | 41.6% | 40.2% |
 
41
  | HumanEval | | 88.1% | 87.2% |
42
 
43
  # Prompt Format
 
44
  The model uses Llama3 prompt format.
45
 
46
  # System Message
 
47
  The system message *must* be the following:
48
 
49
  ```You are Tess-R1, an advanced AI that was created for complex reasoning. Given a user query, you are able to first create a Chain-of-Thought (CoT) reasoning. Once the CoT is devised, you then proceed to first think about how to answer. While doing this, you have the capability to contemplate on the thought, and also provide alternatives. Once the CoT steps have been thought through, you then respond by creating the final output.```
50
 
51
  # Inference
52
 
53
+ I have included a sample Python script below. This script uses a try/catch statement to carry forward the model generations in a multi-turn conversation.
 
 
 
 
 
54
 
55
  ```python
56
  import torch, json