slimfrikha-tii commited on
Commit
2b69b8f
·
1 Parent(s): 47bca49

docs(readme.md): init readme

Browse files
Files changed (1) hide show
  1. README.md +305 -3
README.md CHANGED
@@ -1,3 +1,305 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - falcon3
6
+ ---
7
+
8
+
9
+ # Table of Contents
10
+
11
+ 0. [TL;DR](#TL;DR)
12
+ 1. [Model Details](#model-details)
13
+ 2. [Usage](#usage)
14
+ 3. [Training Details](#training-details)
15
+ 4. [Evaluation](#evaluation)
16
+
17
+
18
+ # TL;DR
19
+ Falcon 3 family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
20
+
21
+ This repository contains the Falcon3-7B-Instruct, the best Instruct LLM under 8B at the time of release.
22
+
23
+ # Model Details
24
+
25
+ ## Model Description
26
+
27
+ - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
28
+ - **Model type:** Causal decoder-only
29
+ - **Architecture:** Transformer-base
30
+ - **Language(s) (NLP):** Mainly English
31
+ - **License:** TII Falcon-LLM License 2.0
32
+
33
+ <br>
34
+
35
+ # Usage
36
+
37
+ Find below an example on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
38
+
39
+ <details>
40
+ <summary> Click to expand </summary>
41
+
42
+ ```python
43
+ from transformers import AutoTokenizer, AutoModelForCausalLM
44
+
45
+
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+
48
+ model_name = "tiiuae/Falcon3-7B-Instruct"
49
+
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ model_name,
52
+ torch_dtype="auto",
53
+ device_map="auto"
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
56
+
57
+ prompt = "How many hours in one day?"
58
+ messages = [
59
+ {"role": "system", "content": "You are a helpful friendly assistant Falcon3 from TII, try to follow instructions as much as possible."},
60
+ {"role": "user", "content": prompt}
61
+ ]
62
+ text = tokenizer.apply_chat_template(
63
+ messages,
64
+ tokenize=False,
65
+ add_generation_prompt=True
66
+ )
67
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
68
+
69
+ generated_ids = model.generate(
70
+ **model_inputs,
71
+ max_new_tokens=1024
72
+ )
73
+ generated_ids = [
74
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
75
+ ]
76
+
77
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
78
+ print(response)
79
+ ```
80
+
81
+ </details>
82
+
83
+
84
+ # Training Details
85
+ Based on `tiiuae/Falcon3-7B-Base`, post-training stage is comprised of supervised finetuning followed by human preference alignement (DPO).
86
+
87
+ ## Supervised finetuning
88
+ ### Training Data
89
+ 1.2 million diverse, high-quality samples Tulu-3, Open-Hermes, Numina an Apigen.
90
+
91
+ | Data type | ratio |
92
+ |--------------------------------------|-------|
93
+ | Conversations | 32% |
94
+ | STEM | 32% |
95
+ | Code | 12% |
96
+ | Safety | 9.1% |
97
+ | Multi lingual | 8.3% |
98
+ | Function call | 3.3% |
99
+ | NLP (summarization, generation, QA) | 3.2% |
100
+
101
+ #### Training Hyperparameters
102
+
103
+ <style type="text/css">
104
+ .tg {border-collapse:collapse;border-spacing:0;}
105
+ .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
106
+ overflow:hidden;padding:10px 5px;word-break:normal;}
107
+ .tg th{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
108
+ font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
109
+ .tg .tg-c3ow{border-color:inherit;text-align:center;vertical-align:top}
110
+ .tg .tg-7btt{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
111
+ .tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
112
+ .tg .tg-ihkz{border-color:inherit;text-align:center;vertical-align:top}
113
+ .tg .tg-pcvp{border-color:inherit;text-align:left;vertical-align:top}
114
+ .tg .tg-j2vi{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
115
+ .tg .tg-amwm{border-color:inherit;text-align:left;vertical-align:top}
116
+ .tg .tg-0lax{border-color:inherit;font-weight:bold;text-align:center;vertical-align:top}
117
+ </style>
118
+ <table class="tg"><thead>
119
+ <tr>
120
+ <th class="tg-7btt" rowspan="3">AdamW</th>
121
+ <th class="tg-c3ow">β1</th>
122
+ <th class="tg-0pky">0.9</th>
123
+ </tr>
124
+ <tr>
125
+ <th class="tg-ihkz">β2</th>
126
+ <th class="tg-pcvp">0.999</th>
127
+ </tr>
128
+ <tr>
129
+ <th class="tg-c3ow">weight decay</th>
130
+ <th class="tg-0pky">0.01</th>
131
+ </tr></thead>
132
+ <tbody>
133
+ <tr>
134
+ <td class="tg-j2vi" rowspan="4">Learning rate</td>
135
+ <td class="tg-ihkz">type</td>
136
+ <td class="tg-pcvp">linear decay</td>
137
+ </tr>
138
+ <tr>
139
+ <td class="tg-c3ow">init lr</td>
140
+ <td class="tg-0pky">5e-6</td>
141
+ </tr>
142
+ <tr>
143
+ <td class="tg-ihkz">final lr</td>
144
+ <td class="tg-pcvp">0</td>
145
+ </tr>
146
+ <tr>
147
+ <td class="tg-c3ow">warm rate</td>
148
+ <td class="tg-0pky">0.03</td>
149
+ </tr>
150
+ <tr>
151
+ <td class="tg-j2vi">Batch size</td>
152
+ <td class="tg-ihkz"></td>
153
+ <td class="tg-pcvp">64</td>
154
+ </tr>
155
+ <tr>
156
+ <td class="tg-amwm">Epochs</td>
157
+ <td class="tg-0lax"></td>
158
+ <td class="tg-0lax">2</td>
159
+ </tr>
160
+ </tbody>
161
+ </table>
162
+
163
+ ## Human preference alignment - DPO
164
+
165
+ ### Training Data
166
+ TO DO DO DO DO
167
+
168
+ #### Training Hyperparameters
169
+ TODODODODOD
170
+
171
+
172
+ # Evaluation
173
+ We report in the following table our internal pipeline benchmarks:
174
+
175
+
176
+ <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
177
+ <colgroup>
178
+ <col style="width: 10%;">
179
+ <col style="width: 10%;">
180
+ <col style="width: 7%;">
181
+ <col style="width: 7%;">
182
+ <col style="width: 7%;">
183
+ <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
184
+ </colgroup>
185
+ <thead>
186
+ <tr>
187
+ <th>Category</th>
188
+ <th>Benchmark</th>
189
+ <th>Llama-3.1-8B-Instruct</th>
190
+ <th>Qwen2-7B-Instruct</th>
191
+ <th>Qwen2.5-7B-Instruct</th>
192
+ <th>Falcon3-7B-Instruct</th>
193
+ </tr>
194
+ </thead>
195
+ <tbody>
196
+ <tr>
197
+ <td rowspan="3">General</td>
198
+ <td>MMLU (5-shot)</td>
199
+ <td>-</td>
200
+ <td>-</td>
201
+ <td>-</td>
202
+ <td>-</td>
203
+ </tr>
204
+ <tr>
205
+ <td>MMLU-PRO (5-shot)</td>
206
+ <td>-</td>
207
+ <td>-</td>
208
+ <td>-</td>
209
+ <td>-</td>
210
+ </tr>
211
+ <tr>
212
+ <td>IFEval</td>
213
+ <td>-</td>
214
+ <td>-</td>
215
+ <td>-</td>
216
+ <td>-</td>
217
+ </tr>
218
+ <tr>
219
+ <td rowspan="2">Math</td>
220
+ <td>GSM8K (5-shot)</td>
221
+ <td>-</td>
222
+ <td>-</td>
223
+ <td>-</td>
224
+ <td>-</td>
225
+ </tr>
226
+ <tr>
227
+ <td>MATH(4-shot)</td>
228
+ <td>-</td>
229
+ <td>-</td>
230
+ <td>-</td>
231
+ <td>-</td>
232
+ </tr>
233
+ <tr>
234
+ <td rowspan="4">Reasoning</td>
235
+ <td>Arc Challenge (25-shot)</td>
236
+ <td>-</td>
237
+ <td>-</td>
238
+ <td>-</td>
239
+ <td>-</td>
240
+ </tr>
241
+ <tr>
242
+ <td>GPQA (0-shot)</td>
243
+ <td>-</td>
244
+ <td>-</td>
245
+ <td>-</td>
246
+ <td>-</td>
247
+ </tr>
248
+ <tr>
249
+ <td>MUSR (0-shot)</td>
250
+ <td>-</td>
251
+ <td>-</td>
252
+ <td>-</td>
253
+ <td>-</td>
254
+ </tr>
255
+ <tr>
256
+ <td>BBH (3-shot)</td>
257
+ <td>-</td>
258
+ <td>-</td>
259
+ <td>-</td>
260
+ <td>-</td>
261
+ </tr>
262
+ <tr>
263
+ <td rowspan="4">CommonSense Understanding</td>
264
+ <td>PIQA (0-shot)</td>
265
+ <td>-</td>
266
+ <td>-</td>
267
+ <td>-</td>
268
+ <td>-</td>
269
+ </tr>
270
+ <tr>
271
+ <td>SciQ (0-shot)</td>
272
+ <td>-</td>
273
+ <td>-</td>
274
+ <td>-</td>
275
+ <td>-</td>
276
+ </tr>
277
+ <tr>
278
+ <td>Winogrande (0-shot)</td>
279
+ <td>-</td>
280
+ <td>-</td>
281
+ <td>-</td>
282
+ <td>-</td>
283
+ </tr>
284
+ <tr>
285
+ <td>OpenbookQA (0-shot)</td>
286
+ <td>-</td>
287
+ <td>-</td>
288
+ <td>-</td>
289
+ <td>-</td>
290
+ </tr>
291
+ </tbody>
292
+ </table>
293
+
294
+
295
+ # Citation
296
+ If Falcon3 series were helpful to your work, feel free to give us a cite.
297
+
298
+ ```
299
+ @misc{Falcon3,
300
+ title = {Falcon 3 family of Open Foundation Models},
301
+ author = {TII Team},
302
+ month = {December},
303
+ year = {2024}
304
+ }
305
+ ```