puneeshkhanna commited on
Commit
a64ebc0
1 Parent(s): 3097aa6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -23,13 +23,13 @@ Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese)
23
  - Architecture
24
  - Transformer based causal decoder only architecture
25
  - 28 decoder blocks
26
- - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
27
  - Wider head dimension: 256
28
  - High RoPE value to support long context understanding: 1000042
29
  - Uses SwiGLU and RMSNorm
30
- - 32k context length
31
- - 131k vocab size
32
- - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
33
  - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
34
  - Supports EN, FR, ES, PT
35
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
 
23
  - Architecture
24
  - Transformer based causal decoder only architecture
25
  - 28 decoder blocks
26
+ - Grouped query attention (GQA) for faster inference: 12 query heads and 4 key value heads
27
  - Wider head dimension: 256
28
  - High RoPE value to support long context understanding: 1000042
29
  - Uses SwiGLU and RMSNorm
30
+ - 32K context length
31
+ - 131K vocab size
32
+ - Pretrained on 14 Teratokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
33
  - Postrained on 1.2 million samples of STEM, conversations, code, safety and function call data
34
  - Supports EN, FR, ES, PT
35
  - Developed by [Technology Innovation Institute](https://www.tii.ae)