tiiuae
/

Falcon3-10B-Base

@@ -1,123 +1,62 @@
----
 language:
 - en
 tags:
 - falcon3
 ---
-#  Table of Contents
-0. [TL;DR](#TL;DR)
-1. [Model Details](#model-details)
-2. [Usage](#usage)
-3. [Training Details](#training-details)
-4. [Evaluation](#evaluation)
-# TL;DR
-# Model Details
-## Model Description
-- **Developed by:** [https://www.tii.ae](https://www.tii.ae)
-- **Model type:** Causal decoder-only
-- **Architecture:** Transformer-base
-- **Language(s) (NLP):** Mainly English
-- **License:** TII Falcon-LLM License 2.0
-<br>
-# Usage
-Find below some example scripts on how to use the model in `transformers` (Make sure to have the latest transformers, or the one built from source):
-## Using the Pytorch model with 🤗 transformers
-### Running the model on a CPU
-<details>
-<summary> Click to expand </summary>
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
-model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base")
-input_text = "Question: How many hours in one day? Answer: "
-input_ids = tokenizer(input_text, return_tensors="pt").input_ids
-outputs = model.generate(input_ids)
-print(tokenizer.decode(outputs[0]))
-```
-</details>
-### Running the model on a GPU
-<details>
-<summary> Click to expand </summary>
-```python
-# pip install accelerate
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
-model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", device_map="auto")
-input_text = "Question: How many hours in one day? Answer: "
-input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
-outputs = model.generate(input_ids)
-print(tokenizer.decode(outputs[0]))
-```
-</details>
-### Running the model on a GPU using `torch.compile`
 <details>
 <summary> Click to expand </summary>
 ```python
 import torch
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("tiiuae/Falcon3-10B-Base")
-model = AutoModelForCausalLM.from_pretrained("tiiuae/Falcon3-10B-Base", torch_dtype=torch.bfloat16).to(0)
-model = torch.compile(model)
-input_text = "Question: How many hours in one day? Answer: "
-input_ids = tokenizer(input_text, return_tensors="pt").input_ids.to("cuda")
-outputs = model.generate(input_ids)
-print(tokenizer.decode(outputs[0]))
 ```
 </details>
-# Training Details
-## Training Data
-## Training Procedure
-### Training Hyperparameters
-| **Hyperparameter** | **Value**  | **Comment**                               |
-|--------------------|------------|-------------------------------------------|
-| Precision          | `bfloat16` |                                           |
-| Optimizer          | AdamW      |                                           |
-| Max learning rate  |      | Following a WSD (warmup-stable-decay) learning rate schedule |
-| Weight decay       |        |                                           |
-| Batch size         |        |                                           |
-# Evaluation
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     <colgroup>
@@ -127,19 +66,11 @@ print(tokenizer.decode(outputs[0]))
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
-        <col style="width: 7%;">
-        <col style="width: 7%;">
-        <col style="width: 7%;">
-        <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
         <tr>
             <th>Category</th>
             <th>Benchmark</th>
-            <th>Llama3.1-8B</th>
-            <th>Qwen2-7B</th>
-            <th>Qwen2.5-7B</th>
-            <th>Falcon3-7B-Base</th>
             <th>Gemma2-9B</th>
             <th>Yi1.5-9B</th>
             <th>Mistral-NeMo-12B</th>
@@ -150,10 +81,6 @@ print(tokenizer.decode(outputs[0]))
         <tr>
             <td rowspan="3">General</td>
             <td>MMLU (5-shot)</td>
-            <td>65.2</td>
-            <td>70.4</td>
-            <td>74.2</td>
-            <td>67.5</td>
             <td>0</td>
             <td>69.6</td>
             <td>68.8</td>
@@ -161,10 +88,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>MMLU-PRO (5-shot)</td>
-            <td>32.7</td>
-            <td>42.1</td>
-            <td>43.5</td>
-            <td>39.2</td>
             <td>0</td>
             <td>39.3</td>
             <td>34.7</td>
@@ -172,10 +95,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>IFEval</td>
-            <td>12.0</td>
-            <td>30.6</td>
-            <td>33.9</td>
-            <td>34.3</td>
             <td>0</td>
             <td>29.1</td>
             <td>16.1</td>
@@ -184,10 +103,6 @@ print(tokenizer.decode(outputs[0]))
         <tr>
             <td rowspan="2">Math</td>
             <td>GSM8K (5-shot)</td>
-            <td>49.4</td>
-            <td>77.9</td>
-            <td>82.9</td>
-            <td>76.2</td>
             <td>69.1</td>
             <td>63.8</td>
             <td>55.3</td>
@@ -195,10 +110,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>MATH(4-shot)</td>
-            <td>4.1</td>
-            <td>17.5</td>
-            <td>15.5</td>
-            <td>18.0</td>
             <td>0</td>
             <td>9.2</td>
             <td>4.9</td>
@@ -207,10 +118,6 @@ print(tokenizer.decode(outputs[0]))
         <tr>
             <td rowspan="4">Reasoning</td>
             <td>Arc Challenge (25-shot)</td>
-            <td>53.4</td>
-            <td>57.4</td>
-            <td>59.0</td>
-            <td>59.6</td>
             <td>63.7</td>
             <td>58.2</td>
             <td>60.6</td>
@@ -218,10 +125,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>GPQA (0-shot)</td>
-            <td>31.0</td>
-            <td>31.9</td>
-            <td>33.0</td>
-            <td>35.5</td>
             <td>0</td>
             <td>36.6</td>
             <td>28.8</td>
@@ -229,10 +132,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>MUSR (0-shot)</td>
-            <td>38.0</td>
-            <td>44.1</td>
-            <td>44.2</td>
-            <td>47.3</td>
             <td>0</td>
             <td>43.3</td>
             <td>39.2</td>
@@ -240,10 +139,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>BBH (3-shot)</td>
-            <td>46.5</td>
-            <td>53.3</td>
-            <td>54.0</td>
-            <td>51.0</td>
             <td>0</td>
             <td>51.3</td>
             <td>50.2</td>
@@ -252,10 +147,6 @@ print(tokenizer.decode(outputs[0]))
         <tr>
             <td rowspan="4">CommonSense Understanding</td>
             <td>PIQA (0-shot)</td>
-            <td>80.3</td>
-            <td>79.8</td>
-            <td>78.7</td>
-            <td>77.7</td>
             <td>81.4</td>
             <td>79.8</td>
             <td>81.4</td>
@@ -263,10 +154,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>SciQ (0-shot)</td>
-            <td>96.3</td>
-            <td>95.9</td>
-            <td>96.6</td>
-            <td>95.3</td>
             <td>97.2</td>
             <td>95.8</td>
             <td>96.4</td>
@@ -274,10 +161,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>Winogrande (0-shot)</td>
-            <td>74.0</td>
-            <td>72.1</td>
-            <td>72.9</td>
-            <td>71.0</td>
             <td>74.2</td>
             <td>72.7</td>
             <td>73.2</td>
@@ -285,10 +168,6 @@ print(tokenizer.decode(outputs[0]))
         </tr>
         <tr>
             <td>OpenbookQA (0-shot)</td>
-            <td>33.4</td>
-            <td>35.2</td>
-            <td>33.6</td>
-            <td>31.4</td>
             <td>34.0</td>
             <td>35.4</td>
             <td>36.4</td>
@@ -300,5 +179,15 @@ print(tokenizer.decode(outputs[0]))
 # Citation

 language:
 - en
+- fr
+- es
+- pt
 tags:
 - falcon3
 ---
+# Falcon3-7B-Base
+**Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
+This repository contains the **Falcon3-7B-Base**. It achieves state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
+Falcon3-7B-Base supports 4 languages (english, french, spanish, portuguese) and a context length up to 32K.
+⚠️ **This is a raw, pretrained model, which should be further finetuned for most usecases.**
+## Model Details
+- Architecture
+  - transformer based causal decoder only architecture
+  - 28 decoder blocks
+  - grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
+  - wider head dimension: 256
+  - high RoPE value to support long context understanding: 1000042
+  - 32k context length
+  - 131k vocab size
+- Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
+- Supports EN, FR, ES, PT
+- Developed by [Technology Innovation Institute](https://www.tii.ae)
+- License: TII Falcon-LLM License 2.0
+- Model Release Date: December 2024
+## Getting started
 <details>
 <summary> Click to expand </summary>
 ```python
 import torch
+from transformers import pipeline
+pipe = pipeline(
+    "text-generation",
+    model="tiiuae/Falcon3-7B-Base",
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
+)
+response = pipe("Question: How many hours in one day? Answer: ")
+print(response[0]['generated_text'])
 ```
 </details>
+<br>
+# Benchmarks
+We report in the following table our internal pipeline benchmarks:
 <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
     <colgroup>
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
         <tr>
             <th>Category</th>
             <th>Benchmark</th>
             <th>Gemma2-9B</th>
             <th>Yi1.5-9B</th>
             <th>Mistral-NeMo-12B</th>
         <tr>
             <td rowspan="3">General</td>
             <td>MMLU (5-shot)</td>
             <td>0</td>
             <td>69.6</td>
             <td>68.8</td>
         </tr>
         <tr>
             <td>MMLU-PRO (5-shot)</td>
             <td>0</td>
             <td>39.3</td>
             <td>34.7</td>
         </tr>
         <tr>
             <td>IFEval</td>
             <td>0</td>
             <td>29.1</td>
             <td>16.1</td>
         <tr>
             <td rowspan="2">Math</td>
             <td>GSM8K (5-shot)</td>
             <td>69.1</td>
             <td>63.8</td>
             <td>55.3</td>
         </tr>
         <tr>
             <td>MATH(4-shot)</td>
             <td>0</td>
             <td>9.2</td>
             <td>4.9</td>
         <tr>
             <td rowspan="4">Reasoning</td>
             <td>Arc Challenge (25-shot)</td>
             <td>63.7</td>
             <td>58.2</td>
             <td>60.6</td>
         </tr>
         <tr>
             <td>GPQA (0-shot)</td>
             <td>0</td>
             <td>36.6</td>
             <td>28.8</td>
         </tr>
         <tr>
             <td>MUSR (0-shot)</td>
             <td>0</td>
             <td>43.3</td>
             <td>39.2</td>
         </tr>
         <tr>
             <td>BBH (3-shot)</td>
             <td>0</td>
             <td>51.3</td>
             <td>50.2</td>
         <tr>
             <td rowspan="4">CommonSense Understanding</td>
             <td>PIQA (0-shot)</td>
             <td>81.4</td>
             <td>79.8</td>
             <td>81.4</td>
         </tr>
         <tr>
             <td>SciQ (0-shot)</td>
             <td>97.2</td>
             <td>95.8</td>
             <td>96.4</td>
         </tr>
         <tr>
             <td>Winogrande (0-shot)</td>
             <td>74.2</td>
             <td>72.7</td>
             <td>73.2</td>
         </tr>
         <tr>
             <td>OpenbookQA (0-shot)</td>
             <td>34.0</td>
             <td>35.4</td>
             <td>36.4</td>
 # Citation
+If Falcon3 family were helpful to your work, feel free to give us a cite.
+```
+@misc{Falcon3,
+    title = {Falcon 3 family of Open Foundation Models},
+    author = {TII Team},
+    month = {December},
+    year = {2024}
+}
+```