File size: 8,232 Bytes
fdd4714 8d6e025 fdd4714 de1a7d7 fdd4714 0523050 f81c056 fdd4714 e3350e4 fdd4714 e3350e4 f81c056 b94a69a e3350e4 67adc87 f81c056 39eb076 fdd4714 4936c0b fdd4714 e3350e4 fdd4714 9f4c0a1 fdd4714 9f4c0a1 fdd4714 9f4c0a1 fdd4714 9f4c0a1 8511385 9f4c0a1 fdd4714 9f4c0a1 fdd4714 8511385 fdd4714 9f4c0a1 fdd4714 9f4c0a1 fdd4714 9f4c0a1 fdd4714 9f4c0a1 fdd4714 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
---
license: llama3.1
base_model: meta-llama/Llama-3.1-70B
model-index:
- name: Tess-R1-Llama-3.1-70B
results: []
---
# Tess-R1 Limerick (Llama-3.1-70B)
![Tess-R1-Llama-3.1-70B](https://huggingface.co/migtissera/Tess-R1-Llama-3.1-70B/resolve/main/Tess-R1-2.jpg)
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
# Introduction
Welcome to the Tess-Reasoning-1 (Tess-R1) series of models. Tess-R1 is designed with test-time compute in mind, and has the capabilities to produce a Chain-of-Thought (CoT) reasoning before producing the final output.
The model is trained to first think step-by-step, and contemplate on its answers. It can also write alternatives after contemplating. Once all the steps have been thought through, it writes the final output.
1. Step-by-step, Chain-of-Thought thinking process. Uses `<thinking>` `</thinking>` tags to indicate when the model is performing CoT.
2. `<contemplation>` `</contemplation>` tags are used when the model contemplate on its answers.
3. `<alternatively>` `</alternatively>` tags are used for alternate suggestions.
4. Finally, `<output>` `</output>` tags are used for the final output
## Important Note:
In a multi-turn conversation, only the contents between the `<output>` `</output>` tags (discarding the tags) should be carried forward. Otherwise the model will see out of distribution input data and will fail.
The model was trained mostly with Chain-of-Thought reasoning data, including the XML tags. However, to generalize model generations, some single-turn and multi-turn data without XML tags were also included. Due to this, in some instances the model does not produce XML tags and does not fully utilize test-time compute capabilities. There is two ways to get around this:
- Include a try/catch statement in your inference script, and only pass on the contents between the `<output>` `</output>` tags if it's available.
- Use the `<thinking>` tag as the seed in the generation, and force the model to produce outputs with XML tags. i.e: `f"{conversation}{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n<thinking>"`
# Prompt Format
The model uses Llama3 prompt format.
# System Message
The system message *must* be the following:
```You are Tess-R1, an advanced AI that was created for complex reasoning. Given a user query, you are able to first create a Chain-of-Thought (CoT) reasoning. Once the CoT is devised, you then proceed to first think about how to answer. While doing this, you have the capability to contemplate on the thought, and also provide alternatives. Once the CoT steps have been thought through, you then respond by creating the final output.```
# Evaluations
Since the model is trained to use test-time-compute, the evalutations were performed by first setting the system message, and then extracting the contents between the `<output>` `</output>` tags. Only the contents between the tags were then used for the evaluations.
| | Tess-R1 Limerick | Claude 3.5 Haiku | GPT-4o mini |
|--------------|------------------|------------------|-------------|
| GPQA | 41.5% | 41.6% | 40.2% |
| MMLU | 81.6% | - | 82.0% |
| MATH | 64.2% | 69.4% | 70.2% |
| MMLU-Pro | 65.6% | 65.0% | - |
| HumanEval | 61.0% | 88.1% | 87.2% |
The evaluations were performed using a fork of Glaive's `simple-evals` codebase. Many thanks to @winglian for performing the evals. The codebase for evaluations can be found here: https://github.com/winglian/simple-evals
Example to run evaluations:
`python run_reflection_eval.py tess_r1_70b --evals gpqa mmlu math`
The system message have been edited in the sampler to reflect Tess-R1's system prompt.
# Inference
I have included a sample Python script below. This script uses a try/catch statement to carry forward the model generations in a multi-turn conversation.
```python
import torch, json
from transformers import AutoModelForCausalLM, AutoTokenizer
import re
class LLM(object):
def __init__(self, model_path):
self.model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype=torch.bfloat16,
device_map="auto",
load_in_4bit=False,
trust_remote_code=False,
)
self.tokenizer = AutoTokenizer.from_pretrained(
model_path, trust_remote_code=False
)
self.terminators = [
self.tokenizer.convert_tokens_to_ids("<|end_of_text|>"),
self.tokenizer.convert_tokens_to_ids("<|eot_id|>"),
]
def generate_text(self, instruction):
tokens = self.tokenizer.encode(instruction)
tokens = torch.LongTensor(tokens).unsqueeze(0)
tokens = tokens.to("cuda")
instance = {
"input_ids": tokens,
"top_p": 1.0,
"temperature": 0.75,
"generate_len": 4096,
"top_k": 50,
}
length = len(tokens[0])
with torch.no_grad():
rest = self.model.generate(
input_ids=tokens,
max_length=length + instance["generate_len"],
use_cache=True,
do_sample=True,
top_p=instance["top_p"],
temperature=instance["temperature"],
top_k=instance["top_k"],
num_return_sequences=1,
pad_token_id=self.tokenizer.eos_token_id,
eos_token_id=self.terminators,
)
output = rest[0][length:]
string = self.tokenizer.decode(output, skip_special_tokens=True)
return f"{string}"
def extract_output(self, text):
pattern = r"<output>(.*?)</output>"
match = re.search(pattern, text, re.DOTALL)
content = match.group(1).strip()
return content
def respond_llama3(self, user_prompt):
conversation = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are Tess-R1, an advanced AI that was created for complex reasoning. Given a user query, you are able to first create a Chain-of-Thought (CoT) reasoning. Once the CoT is devised, you then proceed to first think about how to answer. While doing this, you have the capability to contemplate on the thought, and also provide alternatives. Once the CoT steps have been thought through, you then respond by creating the final output.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"""
llm_prompt = f"{conversation}{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
answer = self.generate_text(llm_prompt)
try:
answer_output = self.extract_output(answer)
return answer_output
except:
return answer
model_path = "neurolattice/Tess-R1-Llama-3.1-70B"
llm = LLM(model_path)
conversation = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are Tess-R1, an advanced AI that was created for complex reasoning. Given a user query, you are able to first create a Chain-of-Thought (CoT) reasoning. Once the CoT is devised, you then proceed to first think about how to answer. While doing this, you have the capability to contemplate on the thought, and also provide alternatives. Once the CoT steps have been thought through, you then respond by creating the final output.<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"""
while True:
user_input = input("You: ")
llm_prompt = f"{conversation}{user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
answer = llm.generate_text(llm_prompt)
print("=" * 132)
print(answer)
try:
answer_output = llm.extract_output(answer)
print("=" * 132)
print(answer_output)
conversation = f"{llm_prompt}{answer_output}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
except:
conversation = f"{llm_prompt}{answer}<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n"
``` |