Triangle104
commited on
Commit
•
d55bd93
1
Parent(s):
6dd1cd1
Update README.md
Browse files
README.md
CHANGED
@@ -17,25 +17,90 @@ This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](htt
|
|
17 |
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
|
18 |
|
19 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
|
22 |
|
23 |
The chat template for our models is formatted as:
|
24 |
|
25 |
|
26 |
-
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
|
|
|
|
|
|
|
27 |
|
28 |
Or with new lines expanded:
|
29 |
|
|
|
30 |
<|user|>
|
31 |
How are you doing?
|
32 |
<|assistant|>
|
33 |
-
I'm just a computer program, so I don't have feelings, but I'm
|
|
|
|
|
34 |
|
35 |
It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
|
36 |
|
37 |
-
|
38 |
-
|
39 |
|
40 |
|
41 |
|
@@ -44,10 +109,11 @@ In Ai2 demos, we use this system prompt by default:
|
|
44 |
|
45 |
You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
|
46 |
|
|
|
47 |
The model has not been trained with a specific system prompt in mind.
|
48 |
|
49 |
-
|
50 |
-
|
51 |
|
52 |
|
53 |
|
@@ -60,10 +126,13 @@ to train the base Llama 3.1 models, however it is likely to have
|
|
60 |
included a mix of Web data and technical sources like books and code.
|
61 |
See the Falcon 180B model card for an example of this.
|
62 |
|
|
|
63 |
Hyperparamters
|
64 |
|
|
|
65 |
PPO settings for RLVR:
|
66 |
|
|
|
67 |
Learning Rate: 3 × 10⁻⁷
|
68 |
Discount Factor (gamma): 1.0
|
69 |
General Advantage Estimation (lambda): 0.95
|
@@ -83,8 +152,8 @@ Total Episodes: 100,000
|
|
83 |
KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
|
84 |
Warm up ratio (omega): 0.0
|
85 |
|
86 |
-
|
87 |
-
|
88 |
|
89 |
|
90 |
|
@@ -98,8 +167,8 @@ The models have been fine-tuned using a dataset mix with outputs
|
|
98 |
generated from third party models and are subject to additional terms:
|
99 |
Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
|
100 |
|
101 |
-
|
102 |
-
|
103 |
|
104 |
|
105 |
|
@@ -138,7 +207,6 @@ If Tülu3 or any of the related materials were helpful to your work, please cite
|
|
138 |
}
|
139 |
|
140 |
---
|
141 |
-
|
142 |
## Use with llama.cpp
|
143 |
Install llama.cpp through brew (works on Mac and Linux)
|
144 |
|
|
|
17 |
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
|
18 |
|
19 |
---
|
20 |
+
Model details:
|
21 |
+
-
|
22 |
+
Tülu3 is a leading instruction following model family, offering fully
|
23 |
+
open-source data, code, and recipes designed to serve as a
|
24 |
+
comprehensive guide for modern post-training techniques.
|
25 |
+
Tülu3 is designed for state-of-the-art performance on a diversity of
|
26 |
+
tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
27 |
+
|
28 |
+
|
29 |
+
Model description
|
30 |
+
|
31 |
+
|
32 |
+
|
33 |
+
Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
|
34 |
+
Language(s) (NLP): Primarily English
|
35 |
+
License: Llama 3.1 Community License Agreement
|
36 |
+
Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO
|
37 |
+
|
38 |
+
|
39 |
+
Model Sources
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
Training Repository: https://github.com/allenai/open-instruct
|
44 |
+
Eval Repository: https://github.com/allenai/olmes
|
45 |
+
Paper: https://arxiv.org/abs/2411.15124
|
46 |
+
Demo: https://playground.allenai.org/
|
47 |
+
|
48 |
+
|
49 |
+
Using the model
|
50 |
+
|
51 |
+
|
52 |
+
Loading with HuggingFace
|
53 |
+
|
54 |
+
|
55 |
+
|
56 |
+
To load the model with HuggingFace, use the following snippet:
|
57 |
+
|
58 |
+
|
59 |
+
from transformers import AutoModelForCausalLM
|
60 |
+
|
61 |
+
|
62 |
+
tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B")
|
63 |
+
|
64 |
+
|
65 |
+
VLLM
|
66 |
+
|
67 |
+
|
68 |
+
|
69 |
+
As a Llama base model, the model can be easily served with:
|
70 |
+
|
71 |
+
|
72 |
+
vllm serve allenai/Llama-3.1-Tulu-3-8B
|
73 |
+
|
74 |
+
|
75 |
+
Note that given the long chat template of Llama, you may want to use --max_model_len=8192.
|
76 |
+
|
77 |
+
|
78 |
+
Chat template
|
79 |
|
80 |
|
81 |
|
82 |
The chat template for our models is formatted as:
|
83 |
|
84 |
|
85 |
+
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
|
86 |
+
computer program, so I don't have feelings, but I'm functioning as
|
87 |
+
expected. How can I assist you today?<|endoftext|>
|
88 |
+
|
89 |
|
90 |
Or with new lines expanded:
|
91 |
|
92 |
+
|
93 |
<|user|>
|
94 |
How are you doing?
|
95 |
<|assistant|>
|
96 |
+
I'm just a computer program, so I don't have feelings, but I'm
|
97 |
+
functioning as expected. How can I assist you today?<|endoftext|>
|
98 |
+
|
99 |
|
100 |
It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
|
101 |
|
102 |
+
|
103 |
+
System prompt
|
104 |
|
105 |
|
106 |
|
|
|
109 |
|
110 |
You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
|
111 |
|
112 |
+
|
113 |
The model has not been trained with a specific system prompt in mind.
|
114 |
|
115 |
+
|
116 |
+
Bias, Risks, and Limitations
|
117 |
|
118 |
|
119 |
|
|
|
126 |
included a mix of Web data and technical sources like books and code.
|
127 |
See the Falcon 180B model card for an example of this.
|
128 |
|
129 |
+
|
130 |
Hyperparamters
|
131 |
|
132 |
+
|
133 |
PPO settings for RLVR:
|
134 |
|
135 |
+
|
136 |
Learning Rate: 3 × 10⁻⁷
|
137 |
Discount Factor (gamma): 1.0
|
138 |
General Advantage Estimation (lambda): 0.95
|
|
|
152 |
KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
|
153 |
Warm up ratio (omega): 0.0
|
154 |
|
155 |
+
|
156 |
+
License and use
|
157 |
|
158 |
|
159 |
|
|
|
167 |
generated from third party models and are subject to additional terms:
|
168 |
Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
|
169 |
|
170 |
+
|
171 |
+
Citation
|
172 |
|
173 |
|
174 |
|
|
|
207 |
}
|
208 |
|
209 |
---
|
|
|
210 |
## Use with llama.cpp
|
211 |
Install llama.cpp through brew (works on Mac and Linux)
|
212 |
|