puneeshkhanna
commited on
Commit
•
a64ebc0
1
Parent(s):
3097aa6
Update README.md
Browse files
README.md
CHANGED
@@ -23,13 +23,13 @@ Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese)
|
|
23 |
- Architecture
|
24 |
- Transformer based causal decoder only architecture
|
25 |
- 28 decoder blocks
|
26 |
-
- Grouped query attention (GQA) for faster inference: 12 query heads and 4
|
27 |
- Wider head dimension: 256
|
28 |
- High RoPE value to support long context understanding: 1000042
|
29 |
- Uses SwiGLU and RMSNorm
|
30 |
-
-
|
31 |
-
-
|
32 |
-
- Pretrained on 14
|
33 |
- Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
|
34 |
- Supports EN, FR, ES, PT
|
35 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|
|
|
23 |
- Architecture
|
24 |
- Transformer based causal decoder only architecture
|
25 |
- 28 decoder blocks
|
26 |
+
- Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
|
27 |
- Wider head dimension: 256
|
28 |
- High RoPE value to support long context understanding: 1000042
|
29 |
- Uses SwiGLU and RMSNorm
|
30 |
+
- 32K context length
|
31 |
+
- 131K vocab size
|
32 |
+
- Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
|
33 |
- Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
|
34 |
- Supports EN, FR, ES, PT
|
35 |
- Developed by [Technology Innovation Institute](https://www.tii.ae)
|