melaseddik commited on
Commit
faf49f5
1 Parent(s): 0b20cce

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -157
README.md CHANGED
@@ -1,123 +1,62 @@
1
- ---
2
  language:
3
  - en
 
 
 
4
  tags:
5
  - falcon3
6
  ---
7
 
 
8
 
9
- # Table of Contents
10
-
11
- 0. [TL;DR](#TL;DR)
12
- 1. [Model Details](#model-details)
13
- 2. [Usage](#usage)
14
- 3. [Training Details](#training-details)
15
- 4. [Evaluation](#evaluation)
16
-
17
-
18
- # TL;DR
19
-
20
- # Model Details
21
-
22
- ## Model Description
23
-
24
- - **Developed by:** [https://www.tii.ae](https://www.tii.ae)
25
- - **Model type:** Causal decoder-only
26
- - **Architecture:** Transformer-base
27
- - **Language(s) (NLP):** Mainly English
28
- - **License:** TII Falcon-LLM License 2.0
29
-
30
- <br>
31
-
32
- # Usage
33
-
34
- Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
35
-
36
- ## Using the Pytorch model with 🤗 transformers
37
-
38
- ### Running the model on a CPU
39
-
40
- <details>
41
- <summary> Click to expand </summary>
42
-
43
- ```python
44
- from transformers import AutoTokenizer, AutoModelForCausalLM
45
-
46
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
47
- model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base")
48
-
49
- input_text = "Question: How many hours in one day? Answer: "
50
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids
51
-
52
- outputs = model.generate(input_ids)
53
- print(tokenizer.decode(outputs[0]))
54
- ```
55
-
56
- </details>
57
-
58
- ### Running the model on a GPU
59
-
60
- <details>
61
- <summary> Click to expand </summary>
62
-
63
- ```python
64
- # pip install accelerate
65
- from transformers import AutoTokenizer, AutoModelForCausalLM
66
 
67
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
68
- model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", device_map="auto")
69
 
70
- input_text = "Question: How many hours in one day? Answer: "
71
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
72
 
73
- outputs = model.generate(input_ids)
74
- print(tokenizer.decode(outputs[0]))
75
- ```
 
 
 
 
 
 
 
 
 
 
 
76
 
77
- </details>
78
 
79
- ### Running the model on a GPU using `torch.compile`
80
 
81
  <details>
82
  <summary> Click to expand </summary>
83
 
84
  ```python
85
  import torch
86
- from transformers import AutoTokenizer, AutoModelForCausalLM
87
-
88
- tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
89
- model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", torch_dtype=torch.bfloat16).to(0)
90
-
91
- model = torch.compile(model)
92
-
93
- input_text = "Question: How many hours in one day? Answer: "
94
- input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
95
-
96
- outputs = model.generate(input_ids)
97
- print(tokenizer.decode(outputs[0]))
98
  ```
99
 
100
  </details>
101
 
 
102
 
103
- # Training Details
104
-
105
- ## Training Data
106
-
107
- ## Training Procedure
108
-
109
- ### Training Hyperparameters
110
-
111
- | **Hyperparameter** | **Value** | **Comment** |
112
- |--------------------|------------|-------------------------------------------|
113
- | Precision | `bfloat16` | |
114
- | Optimizer | AdamW | |
115
- | Max learning rate | | Following a WSD (warmup-stable-decay) learning rate schedule |
116
- | Weight decay | | |
117
- | Batch size | | |
118
-
119
-
120
- # Evaluation
121
 
122
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
123
  <colgroup>
@@ -127,19 +66,11 @@ print(tokenizer.decode(outputs[0]))
127
  <col style="width: 7%;">
128
  <col style="width: 7%;">
129
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
130
- <col style="width: 7%;">
131
- <col style="width: 7%;">
132
- <col style="width: 7%;">
133
- <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
134
  </colgroup>
135
  <thead>
136
  <tr>
137
  <th>Category</th>
138
  <th>Benchmark</th>
139
- <th>Llama3.1-8B</th>
140
- <th>Qwen2-7B</th>
141
- <th>Qwen2.5-7B</th>
142
- <th>Falcon3-7B-Base</th>
143
  <th>Gemma2-9B</th>
144
  <th>Yi1.5-9B</th>
145
  <th>Mistral-NeMo-12B</th>
@@ -150,10 +81,6 @@ print(tokenizer.decode(outputs[0]))
150
  <tr>
151
  <td rowspan="3">General</td>
152
  <td>MMLU (5-shot)</td>
153
- <td>65.2</td>
154
- <td>70.4</td>
155
- <td>74.2</td>
156
- <td>67.5</td>
157
  <td>0</td>
158
  <td>69.6</td>
159
  <td>68.8</td>
@@ -161,10 +88,6 @@ print(tokenizer.decode(outputs[0]))
161
  </tr>
162
  <tr>
163
  <td>MMLU-PRO (5-shot)</td>
164
- <td>32.7</td>
165
- <td>42.1</td>
166
- <td>43.5</td>
167
- <td>39.2</td>
168
  <td>0</td>
169
  <td>39.3</td>
170
  <td>34.7</td>
@@ -172,10 +95,6 @@ print(tokenizer.decode(outputs[0]))
172
  </tr>
173
  <tr>
174
  <td>IFEval</td>
175
- <td>12.0</td>
176
- <td>30.6</td>
177
- <td>33.9</td>
178
- <td>34.3</td>
179
  <td>0</td>
180
  <td>29.1</td>
181
  <td>16.1</td>
@@ -184,10 +103,6 @@ print(tokenizer.decode(outputs[0]))
184
  <tr>
185
  <td rowspan="2">Math</td>
186
  <td>GSM8K (5-shot)</td>
187
- <td>49.4</td>
188
- <td>77.9</td>
189
- <td>82.9</td>
190
- <td>76.2</td>
191
  <td>69.1</td>
192
  <td>63.8</td>
193
  <td>55.3</td>
@@ -195,10 +110,6 @@ print(tokenizer.decode(outputs[0]))
195
  </tr>
196
  <tr>
197
  <td>MATH(4-shot)</td>
198
- <td>4.1</td>
199
- <td>17.5</td>
200
- <td>15.5</td>
201
- <td>18.0</td>
202
  <td>0</td>
203
  <td>9.2</td>
204
  <td>4.9</td>
@@ -207,10 +118,6 @@ print(tokenizer.decode(outputs[0]))
207
  <tr>
208
  <td rowspan="4">Reasoning</td>
209
  <td>Arc Challenge (25-shot)</td>
210
- <td>53.4</td>
211
- <td>57.4</td>
212
- <td>59.0</td>
213
- <td>59.6</td>
214
  <td>63.7</td>
215
  <td>58.2</td>
216
  <td>60.6</td>
@@ -218,10 +125,6 @@ print(tokenizer.decode(outputs[0]))
218
  </tr>
219
  <tr>
220
  <td>GPQA (0-shot)</td>
221
- <td>31.0</td>
222
- <td>31.9</td>
223
- <td>33.0</td>
224
- <td>35.5</td>
225
  <td>0</td>
226
  <td>36.6</td>
227
  <td>28.8</td>
@@ -229,10 +132,6 @@ print(tokenizer.decode(outputs[0]))
229
  </tr>
230
  <tr>
231
  <td>MUSR (0-shot)</td>
232
- <td>38.0</td>
233
- <td>44.1</td>
234
- <td>44.2</td>
235
- <td>47.3</td>
236
  <td>0</td>
237
  <td>43.3</td>
238
  <td>39.2</td>
@@ -240,10 +139,6 @@ print(tokenizer.decode(outputs[0]))
240
  </tr>
241
  <tr>
242
  <td>BBH (3-shot)</td>
243
- <td>46.5</td>
244
- <td>53.3</td>
245
- <td>54.0</td>
246
- <td>51.0</td>
247
  <td>0</td>
248
  <td>51.3</td>
249
  <td>50.2</td>
@@ -252,10 +147,6 @@ print(tokenizer.decode(outputs[0]))
252
  <tr>
253
  <td rowspan="4">CommonSense Understanding</td>
254
  <td>PIQA (0-shot)</td>
255
- <td>80.3</td>
256
- <td>79.8</td>
257
- <td>78.7</td>
258
- <td>77.7</td>
259
  <td>81.4</td>
260
  <td>79.8</td>
261
  <td>81.4</td>
@@ -263,10 +154,6 @@ print(tokenizer.decode(outputs[0]))
263
  </tr>
264
  <tr>
265
  <td>SciQ (0-shot)</td>
266
- <td>96.3</td>
267
- <td>95.9</td>
268
- <td>96.6</td>
269
- <td>95.3</td>
270
  <td>97.2</td>
271
  <td>95.8</td>
272
  <td>96.4</td>
@@ -274,10 +161,6 @@ print(tokenizer.decode(outputs[0]))
274
  </tr>
275
  <tr>
276
  <td>Winogrande (0-shot)</td>
277
- <td>74.0</td>
278
- <td>72.1</td>
279
- <td>72.9</td>
280
- <td>71.0</td>
281
  <td>74.2</td>
282
  <td>72.7</td>
283
  <td>73.2</td>
@@ -285,10 +168,6 @@ print(tokenizer.decode(outputs[0]))
285
  </tr>
286
  <tr>
287
  <td>OpenbookQA (0-shot)</td>
288
- <td>33.4</td>
289
- <td>35.2</td>
290
- <td>33.6</td>
291
- <td>31.4</td>
292
  <td>34.0</td>
293
  <td>35.4</td>
294
  <td>36.4</td>
@@ -300,5 +179,15 @@ print(tokenizer.decode(outputs[0]))
300
 
301
 
302
  # Citation
 
 
 
 
 
 
 
 
 
 
303
 
304
 
 
 
1
  language:
2
  - en
3
+ - fr
4
+ - es
5
+ - pt
6
  tags:
7
  - falcon3
8
  ---
9
 
10
+ # Falcon3-7B-Base
11
 
12
+ **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
+ This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
15
+ Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
16
 
17
+ ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 
18
 
19
+ ## Model Details
20
+ - Architecture
21
+ - transformer based causal decoder only architecture
22
+ - 28 decoder blocks
23
+ - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
24
+ - wider head dimension: 256
25
+ - high RoPE value to support long context understanding: 1000042
26
+ - 32k context length
27
+ - 131k vocab size
28
+ - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
29
+ - Supports EN, FR, ES, PT
30
+ - Developed by [Technology Innovation Institute](https://www.tii.ae)
31
+ - License: TII Falcon-LLM License 2.0
32
+ - Model Release Date: December 2024
33
 
 
34
 
35
+ ## Getting started
36
 
37
  <details>
38
  <summary> Click to expand </summary>
39
 
40
  ```python
41
  import torch
42
+ from transformers import pipeline
43
+
44
+ pipe = pipeline(
45
+ "text-generation",
46
+ model="tiiuae/Falcon3-7B-Base",
47
+ torch_dtype=torch.bfloat16,
48
+ device_map="auto"
49
+ )
50
+ response = pipe("Question: How many hours in one day? Answer: ")
51
+ print(response[0]['generated_text'])
 
 
52
  ```
53
 
54
  </details>
55
 
56
+ <br>
57
 
58
+ # Benchmarks
59
+ We report in the following table our internal pipeline benchmarks:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
62
  <colgroup>
 
66
  <col style="width: 7%;">
67
  <col style="width: 7%;">
68
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
 
 
 
 
69
  </colgroup>
70
  <thead>
71
  <tr>
72
  <th>Category</th>
73
  <th>Benchmark</th>
 
 
 
 
74
  <th>Gemma2-9B</th>
75
  <th>Yi1.5-9B</th>
76
  <th>Mistral-NeMo-12B</th>
 
81
  <tr>
82
  <td rowspan="3">General</td>
83
  <td>MMLU (5-shot)</td>
 
 
 
 
84
  <td>0</td>
85
  <td>69.6</td>
86
  <td>68.8</td>
 
88
  </tr>
89
  <tr>
90
  <td>MMLU-PRO (5-shot)</td>
 
 
 
 
91
  <td>0</td>
92
  <td>39.3</td>
93
  <td>34.7</td>
 
95
  </tr>
96
  <tr>
97
  <td>IFEval</td>
 
 
 
 
98
  <td>0</td>
99
  <td>29.1</td>
100
  <td>16.1</td>
 
103
  <tr>
104
  <td rowspan="2">Math</td>
105
  <td>GSM8K (5-shot)</td>
 
 
 
 
106
  <td>69.1</td>
107
  <td>63.8</td>
108
  <td>55.3</td>
 
110
  </tr>
111
  <tr>
112
  <td>MATH(4-shot)</td>
 
 
 
 
113
  <td>0</td>
114
  <td>9.2</td>
115
  <td>4.9</td>
 
118
  <tr>
119
  <td rowspan="4">Reasoning</td>
120
  <td>Arc Challenge (25-shot)</td>
 
 
 
 
121
  <td>63.7</td>
122
  <td>58.2</td>
123
  <td>60.6</td>
 
125
  </tr>
126
  <tr>
127
  <td>GPQA (0-shot)</td>
 
 
 
 
128
  <td>0</td>
129
  <td>36.6</td>
130
  <td>28.8</td>
 
132
  </tr>
133
  <tr>
134
  <td>MUSR (0-shot)</td>
 
 
 
 
135
  <td>0</td>
136
  <td>43.3</td>
137
  <td>39.2</td>
 
139
  </tr>
140
  <tr>
141
  <td>BBH (3-shot)</td>
 
 
 
 
142
  <td>0</td>
143
  <td>51.3</td>
144
  <td>50.2</td>
 
147
  <tr>
148
  <td rowspan="4">CommonSense Understanding</td>
149
  <td>PIQA (0-shot)</td>
 
 
 
 
150
  <td>81.4</td>
151
  <td>79.8</td>
152
  <td>81.4</td>
 
154
  </tr>
155
  <tr>
156
  <td>SciQ (0-shot)</td>
 
 
 
 
157
  <td>97.2</td>
158
  <td>95.8</td>
159
  <td>96.4</td>
 
161
  </tr>
162
  <tr>
163
  <td>Winogrande (0-shot)</td>
 
 
 
 
164
  <td>74.2</td>
165
  <td>72.7</td>
166
  <td>73.2</td>
 
168
  </tr>
169
  <tr>
170
  <td>OpenbookQA (0-shot)</td>
 
 
 
 
171
  <td>34.0</td>
172
  <td>35.4</td>
173
  <td>36.4</td>
 
179
 
180
 
181
  # Citation
182
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
183
+
184
+ ```
185
+ @misc{Falcon3,
186
+ title = {Falcon 3 family of Open Foundation Models},
187
+ author = {TII Team},
188
+ month = {December},
189
+ year = {2024}
190
+ }
191
+ ```
192
 
193