Update README.md
#4
by
kasper-piskorski
- opened
- README.md +18 -27
- tokenizer_config.json +1 -1
README.md
CHANGED
@@ -7,35 +7,30 @@ language:
|
|
7 |
tags:
|
8 |
- falcon3
|
9 |
base_model: tiiuae/Falcon3-7B-Base
|
10 |
-
license: other
|
11 |
-
license_name: falcon-llm-license
|
12 |
license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
|
13 |
-
library_name: transformers
|
14 |
---
|
15 |
|
16 |
-
<div align="center">
|
17 |
-
<img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/general/falco3-logo.png" alt="drawing" width="500"/>
|
18 |
-
</div>
|
19 |
-
|
20 |
# Falcon3-7B-Instruct
|
21 |
|
22 |
-
**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
|
23 |
|
24 |
-
This repository contains the **Falcon3-7B-Instruct**. It achieves state
|
25 |
-
Falcon3-7B-Instruct supports 4 languages (
|
26 |
|
27 |
## Model Details
|
28 |
- Architecture
|
29 |
-
- Transformer
|
30 |
- 28 decoder blocks
|
31 |
-
- Grouped
|
32 |
- Wider head dimension: 256
|
33 |
- High RoPE value to support long context understanding: 1000042
|
34 |
- Uses SwiGLU and RMSNorm
|
35 |
- 32K context length
|
36 |
- 131K vocab size
|
37 |
-
- Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using
|
38 |
-
-
|
39 |
- Supports EN, FR, ES, PT
|
40 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
41 |
- License: TII Falcon-LLM License 2.0
|
@@ -91,10 +86,7 @@ print(response)
|
|
91 |
<br>
|
92 |
|
93 |
## Benchmarks
|
94 |
-
We report in the following table our internal pipeline benchmarks
|
95 |
-
- We use [lm-evaluation harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
96 |
-
- We report **raw scores** obtained by applying chat template **without fewshot_as_multiturn** (unlike Llama3.1).
|
97 |
-
- We use same batch-size across all models.
|
98 |
|
99 |
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
|
100 |
<colgroup>
|
@@ -150,7 +142,7 @@ We report in the following table our internal pipeline benchmarks.
|
|
150 |
<td>MATH Lvl-5 (4-shot)</td>
|
151 |
<td>10.4</td>
|
152 |
<td>26</td>
|
153 |
-
<td><b>
|
154 |
</tr>
|
155 |
<tr>
|
156 |
<td rowspan="5">Reasoning</td>
|
@@ -211,15 +203,15 @@ We report in the following table our internal pipeline benchmarks.
|
|
211 |
<tr>
|
212 |
<td rowspan="2">Instructions following</td>
|
213 |
<td>MT-Bench (avg)</td>
|
214 |
-
<td>7.
|
215 |
-
<td><b>8.
|
216 |
-
<td>8.
|
217 |
</tr>
|
218 |
<tr>
|
219 |
-
<td>
|
220 |
-
<td>26.
|
221 |
<td><b>31.5</b></td>
|
222 |
-
<td>26.
|
223 |
</tr>
|
224 |
<tr>
|
225 |
<td>Tool use</td>
|
@@ -231,7 +223,6 @@ We report in the following table our internal pipeline benchmarks.
|
|
231 |
</tbody>
|
232 |
</table>
|
233 |
|
234 |
-
|
235 |
## Technical Report
|
236 |
Coming soon....
|
237 |
|
@@ -245,4 +236,4 @@ If Falcon3 family were helpful to your work, feel free to give us a cite.
|
|
245 |
month = {December},
|
246 |
year = {2024}
|
247 |
}
|
248 |
-
```
|
|
|
7 |
tags:
|
8 |
- falcon3
|
9 |
base_model: tiiuae/Falcon3-7B-Base
|
10 |
+
license: other
|
11 |
+
license_name: falcon-llm-license
|
12 |
license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
|
|
|
13 |
---
|
14 |
|
|
|
|
|
|
|
|
|
15 |
# Falcon3-7B-Instruct
|
16 |
|
17 |
+
**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B parameters.
|
18 |
|
19 |
+
This repository contains the **Falcon3-7B-Instruct**. It achieves state-of-the-art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
|
20 |
+
Falcon3-7B-Instruct supports 4 languages (English, French, Spanish, Portuguese) and a context length of up to 32K.
|
21 |
|
22 |
## Model Details
|
23 |
- Architecture
|
24 |
+
- Transformer-based causal decoder-only architecture
|
25 |
- 28 decoder blocks
|
26 |
+
- Grouped Query Attention (GQA) for faster inference: 12 query heads and 4 key-value heads
|
27 |
- Wider head dimension: 256
|
28 |
- High RoPE value to support long context understanding: 1000042
|
29 |
- Uses SwiGLU and RMSNorm
|
30 |
- 32K context length
|
31 |
- 131K vocab size
|
32 |
+
- Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
33 |
+
- Posttrained on 1.2 million samples of STEM, conversational, code, safety and function call data
|
34 |
- Supports EN, FR, ES, PT
|
35 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
36 |
- License: TII Falcon-LLM License 2.0
|
|
|
86 |
<br>
|
87 |
|
88 |
## Benchmarks
|
89 |
+
We report in the following table our internal pipeline benchmarks:
|
|
|
|
|
|
|
90 |
|
91 |
<table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
|
92 |
<colgroup>
|
|
|
142 |
<td>MATH Lvl-5 (4-shot)</td>
|
143 |
<td>10.4</td>
|
144 |
<td>26</td>
|
145 |
+
<td><b>33.1</b></td>
|
146 |
</tr>
|
147 |
<tr>
|
148 |
<td rowspan="5">Reasoning</td>
|
|
|
203 |
<tr>
|
204 |
<td rowspan="2">Instructions following</td>
|
205 |
<td>MT-Bench (avg)</td>
|
206 |
+
<td>7.86</td>
|
207 |
+
<td><b>8.54</b></td>
|
208 |
+
<td>8.36</td>
|
209 |
</tr>
|
210 |
<tr>
|
211 |
+
<td>Alapaca (WC)</td>
|
212 |
+
<td>26.57</td>
|
213 |
<td><b>31.5</b></td>
|
214 |
+
<td>26.13</td>
|
215 |
</tr>
|
216 |
<tr>
|
217 |
<td>Tool use</td>
|
|
|
223 |
</tbody>
|
224 |
</table>
|
225 |
|
|
|
226 |
## Technical Report
|
227 |
Coming soon....
|
228 |
|
|
|
236 |
month = {December},
|
237 |
year = {2024}
|
238 |
}
|
239 |
+
```
|
tokenizer_config.json
CHANGED
@@ -16219,7 +16219,7 @@
|
|
16219 |
">>PASSWORD<<",
|
16220 |
">>KEY<<"
|
16221 |
],
|
16222 |
-
"chat_template": "{%
|
16223 |
"clean_up_tokenization_spaces": true,
|
16224 |
"eos_token": "<|endoftext|>",
|
16225 |
"extra_special_tokens": {},
|
|
|
16219 |
">>PASSWORD<<",
|
16220 |
">>KEY<<"
|
16221 |
],
|
16222 |
+
"chat_template": "{% for message in messages %}{% if message['role'] == 'system' %}{{ '<|system|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'user' %}{{ '<|user|>\n' + message['content'] + '\n' }}{% elif message['role'] == 'assistant' %}{% if not loop.last %}{{ '<|assistant|>\n' + message['content'] + eos_token + '\n' }}{% else %}{{ '<|assistant|>\n' + message['content'] + eos_token }}{% endif %}{% endif %}{% if loop.last and add_generation_prompt %}{{ '<|assistant|>\n' }}{% endif %}{% endfor %}",
|
16223 |
"clean_up_tokenization_spaces": true,
|
16224 |
"eos_token": "<|endoftext|>",
|
16225 |
"extra_special_tokens": {},
|