tiiuae
/

Falcon3-7B-Base

@@ -1,133 +1,62 @@
 ---
 language:
 - en
-- es
-- pt
 tags:
 - falcon3
 ---
-#  Table of Contents
-0. [TL;DR](#TL;DR)
-1. [Model Details](#model-details)
-2. [Usage](#usage)
-3. [Training Details](#training-details)
-4. [Evaluation](#evaluation)
-# TL;DR
-# Model Details
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
-## Model Description
-- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
-- **Model type:** Causal decoder-only
-- **Architecture:** Transformer-base
-- **Language(s) (NLP):** Mainly English
-- **License:** TII Falcon-LLM License 2.0
-<br>
-# Usage
-Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
-## Using the Pytorch model with 🤗 transformers
-### Running the model on a CPU
-<details>
-<summary> Click to expand </summary>
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
-model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base")
-input_text = "Question: How many hours in one day? Answer: "
-input_ids = tokenizer(input_text, return_tensors="pt").input_ids
-outputs = model.generate(input_ids)
-print(tokenizer.decode(outputs[0]))
-```
-</details>
-### Running the model on a GPU
-<details>
-<summary> Click to expand </summary>
-```python
-# pip install accelerate
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
-model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", device_map="auto")
-input_text = "Question: How many hours in one day? Answer: "
-input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
-outputs = model.generate(input_ids)
-print(tokenizer.decode(outputs[0]))
-```
-</details>
-### Running the model on a GPU using `torch.compile`
 <details>
 <summary> Click to expand </summary>
 ```python
 import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-7B-Base")
-model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-7B-Base", torch_dtype=torch.bfloat16).to(0)
-model = torch.compile(model)
-input_text = "Question: How many hours in one day? Answer: "
-input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
-outputs = model.generate(input_ids)
-print(tokenizer.decode(outputs[0]))
 ```
 </details>
-# Training Details
-## Training Data
-Falcon3-7B is trained on 15 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data.
-## Training Procedure
-Falcon3-7B is trained on 256 H100 nodes (world size 2048).
-### Training Hyperparameters
-| **Hyperparameter** | **Value**  | **Comment**                           |
-|--------------------|------------|---------------------------------------|
-| Precision          | `bfloat16` |                                       |
-| Optimizer          | AdamW      |                                       |
-| Max learning rate  | 6e-4       | Following a WSD (warmup-stable-decay) |
-|                    |            | learning rate scheduler               |
-| Weight decay       | 1e-1       |                                       |
-| z-loss             | 1e-4       |                                       |
-| Batch size         | Variable   | Batch size was gradually increased    |
-|                    |            | during the training                   |
-# Evaluation
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     <colgroup>
@@ -136,6 +65,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
@@ -145,6 +75,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <th>Llama3.1-8B</th>
             <th>Qwen2-7B</th>
             <th>Qwen2.5-7B</th>
             <th>Falcon3-7B-Base</th>
         </tr>
     </thead>
@@ -155,6 +86,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>65.2</td>
             <td>70.4</td>
             <td>74.2</td>
             <td>67.5</td>
         </tr>
         <tr>
@@ -162,6 +94,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>32.7</td>
             <td>42.1</td>
             <td>43.5</td>
             <td>39.2</td>
         </tr>
         <tr>
@@ -169,6 +102,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>12.0</td>
             <td>30.6</td>
             <td>33.9</td>
             <td>34.3</td>
         </tr>
         <tr>
@@ -177,6 +111,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>49.4</td>
             <td>77.9</td>
             <td>82.9</td>
             <td>76.2</td>
         </tr>
         <tr>
@@ -184,6 +119,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>4.1</td>
             <td>17.5</td>
             <td>15.5</td>
             <td>18.0</td>
         </tr>
         <tr>
@@ -192,6 +128,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>53.4</td>
             <td>57.4</td>
             <td>59.0</td>
             <td>59.6</td>
         </tr>
         <tr>
@@ -199,6 +136,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>31.0</td>
             <td>31.9</td>
             <td>33.0</td>
             <td>35.5</td>
         </tr>
         <tr>
@@ -206,6 +144,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>38.0</td>
             <td>44.1</td>
             <td>44.2</td>
             <td>47.3</td>
         </tr>
         <tr>
@@ -213,6 +152,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>46.5</td>
             <td>53.3</td>
             <td>54.0</td>
             <td>51.0</td>
         </tr>
         <tr>
@@ -221,6 +161,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>80.3</td>
             <td>79.8</td>
             <td>78.7</td>
             <td>77.7</td>
         </tr>
         <tr>
@@ -228,6 +169,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>96.3</td>
             <td>95.9</td>
             <td>96.6</td>
             <td>95.3</td>
         </tr>
         <tr>
@@ -235,6 +177,7 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>74.0</td>
             <td>72.1</td>
             <td>72.9</td>
             <td>71.0</td>
         </tr>
         <tr>
@@ -242,11 +185,21 @@ Falcon3-7B is trained on 256 H100 nodes (world size 2048).
             <td>33.4</td>
             <td>35.2</td>
             <td>33.6</td>
             <td>31.4</td>
         </tr>
     </tbody>
 </table>
-# Citation

 ---
 language:
 - en
 tags:
 - falcon3
 ---
+# Falcon3-7B-Base
+**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
+This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
+Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
 ⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
+## Model Details
+- Architecture
+  - transformer based causal decoder only architecture
+  - 28 decoder blocks
+  - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
+  - wider head dimension: 256
+  - high RoPE value to support long context understanding: 1000042
+  - 32k context length
+  - 131k vocab size
+- Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
+- Supports EN, FR, ES, PT
+- Developed by [Technology Innovation Institute](https://www.tii.ae)
+- License: TII Falcon-LLM License 2.0
+- Model Release Date: December 2024
+## Getting started
 <details>
 <summary> Click to expand </summary>
 ```python
 import torch
+from transformers import pipeline
+pipe = pipeline(
+    "text-generation",
+    model="tiiuae/Falcon3-7B-Base",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+response = pipe("Question: How many hours in one day? Answer: ")
+print(response[0]['generated_text'])
 ```
 </details>
+<br>
+# Benchmarks
+We report in the following table our internal pipeline benchmarks:
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     <colgroup>
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
+        <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
             <th>Llama3.1-8B</th>
             <th>Qwen2-7B</th>
             <th>Qwen2.5-7B</th>
+            <th>gemma-2-9b</th>
             <th>Falcon3-7B-Base</th>
         </tr>
     </thead>
             <td>65.2</td>
             <td>70.4</td>
             <td>74.2</td>
+            <td>-</td>
             <td>67.5</td>
         </tr>
         <tr>
             <td>32.7</td>
             <td>42.1</td>
             <td>43.5</td>
+            <td>-</td>
             <td>39.2</td>
         </tr>
         <tr>
             <td>12.0</td>
             <td>30.6</td>
             <td>33.9</td>
+            <td>-</td>
             <td>34.3</td>
         </tr>
         <tr>
             <td>49.4</td>
             <td>77.9</td>
             <td>82.9</td>
+            <td>-</td>
             <td>76.2</td>
         </tr>
         <tr>
             <td>4.1</td>
             <td>17.5</td>
             <td>15.5</td>
+            <td>-</td>
             <td>18.0</td>
         </tr>
         <tr>
             <td>53.4</td>
             <td>57.4</td>
             <td>59.0</td>
+            <td>-</td>
             <td>59.6</td>
         </tr>
         <tr>
             <td>31.0</td>
             <td>31.9</td>
             <td>33.0</td>
+            <td>-</td>
             <td>35.5</td>
         </tr>
         <tr>
             <td>38.0</td>
             <td>44.1</td>
             <td>44.2</td>
+            <td>-</td>
             <td>47.3</td>
         </tr>
         <tr>
             <td>46.5</td>
             <td>53.3</td>
             <td>54.0</td>
+            <td>-</td>
             <td>51.0</td>
         </tr>
         <tr>
             <td>80.3</td>
             <td>79.8</td>
             <td>78.7</td>
+            <td>-</td>
             <td>77.7</td>
         </tr>
         <tr>
             <td>96.3</td>
             <td>95.9</td>
             <td>96.6</td>
+            <td>-</td>
             <td>95.3</td>
         </tr>
         <tr>
             <td>74.0</td>
             <td>72.1</td>
             <td>72.9</td>
+            <td>-</td>
             <td>71.0</td>
         </tr>
         <tr>
             <td>33.4</td>
             <td>35.2</td>
             <td>33.6</td>
+            <td>-</td>
             <td>31.4</td>
         </tr>
     </tbody>
 </table>
+# Citation
+If Falcon3 family were helpful to your work, feel free to give us a cite.
+```
+@misc{Falcon3,
+    title = {Falcon 3 family of Open Foundation Models},
+    author = {TII Team},
+    month = {December},
+    year = {2024}
+}
+```