tiiuae
/

Falcon3-10B-Base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

melaseddik commited on 13 days ago

Commit

b6bed06

•

1 Parent(s): 922a098

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -8,19 +8,19 @@ tags:
 - falcon3
 ---
-# Falcon3-7B-Base
 **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
-This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
-Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 ## Model Details
 - Architecture
   - transformer based causal decoder only architecture
-  - 28 decoder blocks
   - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
   - wider head dimension: 256
   - high RoPE value to support long context understanding: 1000042
@@ -44,7 +44,7 @@ from transformers import pipeline
 pipe = pipeline(
     "text-generation",
-    model="tiiuae/Falcon3-7B-Base",
     torch_dtype=torch.bfloat16,
     device_map="auto"
 )

 - falcon3
 ---
+# Falcon3-10B-Base
 **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
+This repository contains the **Falcon3-10B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
+Falcon3-10B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
 ## Model Details
 - Architecture
   - transformer based causal decoder only architecture
+  - 40 decoder blocks
   - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
   - wider head dimension: 256
   - high RoPE value to support long context understanding: 1000042
 pipe = pipeline(
     "text-generation",
+    model="tiiuae/Falcon3-10B-Base",
     torch_dtype=torch.bfloat16,
     device_map="auto"
 )