tiiuae
/

Falcon3-7B-Instruct

@@ -23,6 +23,7 @@ Falcon3-7B-Instruct supports 4 languages (english, french, spanish, portuguese)
   - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
   - Wider head dimension: 256
   - High RoPE value to support long context understanding: 1000042
   - 32k context length
   - 131k vocab size
 - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
@@ -49,7 +50,7 @@ model_name = "tiiuae/Falcon3-7B-Instruct"
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     torch_dtype="auto",
-    device_map="auto"
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
@@ -90,8 +91,6 @@ We report in the following table our internal pipeline benchmarks:
         <col style="width: 10%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
-        <col style="width: 7%;">
-        <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
@@ -99,9 +98,7 @@ We report in the following table our internal pipeline benchmarks:
             <th>Category</th>
             <th>Benchmark</th>
             <th>Llama-3.1-8B-Instruct</th>
-            <th>Qwen2-7B-Instruct</th>
             <th>Qwen2.5-7B-Instruct</th>
-            <th>gemma-2-9b-it</th>
             <th>Falcon3-7B-Instruct</th>
         </tr>
     </thead>
@@ -109,110 +106,115 @@ We report in the following table our internal pipeline benchmarks:
         <tr>
             <td rowspan="3">General</td>
             <td>MMLU (5-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>MMLU-PRO (5-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>IFEval</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
-            <td rowspan="2">Math</td>
             <td>GSM8K (5-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>MATH(4-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
-            <td rowspan="4">Reasoning</td>
             <td>Arc Challenge (25-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>GPQA (0-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>MUSR (0-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>BBH (3-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td rowspan="4">CommonSense Understanding</td>
             <td>PIQA (0-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>SciQ (0-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>Winogrande (0-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
         <tr>
             <td>OpenbookQA (0-shot)</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
-            <td>-</td>
         </tr>
     </tbody>
 </table>

   - Grouped query attention (GQA) for faster inference: 12 query heads and 4 KV heads
   - Wider head dimension: 256
   - High RoPE value to support long context understanding: 1000042
+  - Uses SwiGLU and RMSNorm
   - 32k context length
   - 131k vocab size
 - Pretrained on 14 Gigatokens of datasets comprising of web, code, STEM, high quality and mutlilingual data using 2048 H100 GPU chips
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     torch_dtype="auto",
+    device_map="auto"]
 )
 tokenizer = AutoTokenizer.from_pretrained(model_name)
         <col style="width: 10%;">
         <col style="width: 7%;">
         <col style="width: 7%;">
         <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
     </colgroup>
     <thead>
             <th>Category</th>
             <th>Benchmark</th>
             <th>Llama-3.1-8B-Instruct</th>
             <th>Qwen2.5-7B-Instruct</th>
             <th>Falcon3-7B-Instruct</th>
         </tr>
     </thead>
         <tr>
             <td rowspan="3">General</td>
             <td>MMLU (5-shot)</td>
+            <td>55.9</td>
+            <td><b>72.4</b></td>
+            <td>68</td>
         </tr>
         <tr>
             <td>MMLU-PRO (5-shot)</td>
+            <td>21.8</td>
+            <td>35.8</td>
+            <td><b>40.7</b></td>
         </tr>
         <tr>
             <td>IFEval</td>
+            <td><b>78.8</b></td>
+            <td>74.7</td>
+            <td>76.5</td>
         </tr>
         <tr>
+            <td rowspan="3">Math</td>
             <td>GSM8K (5-shot)</td>
+            <td>19.2</td>
+            <td>33.7</td>
+            <td><b>78.8</b></td>
+        </tr>
+        <tr>
+            <td>GSM8k (8-shot, COT)</td>
+            <td>79.8</td>
+            <td>72.7</td>
+            <td><b>80.9</b></td>
         </tr>
         <tr>
             <td>MATH(4-shot)</td>
+            <td>10.4</td>
+            <td>26</td>
+            <td><b>33.1</b></td>
         </tr>
         <tr>
+            <td rowspan="6">Reasoning</td>
             <td>Arc Challenge (25-shot)</td>
+            <td>46.6</td>
+            <td>55.7</td>
+            <td><b>65.9</b></td>
         </tr>
         <tr>
             <td>GPQA (0-shot)</td>
+            <td><b>33.6</b></td>
+            <td>31.9</td>
+            <td>32</td>
+        </tr>
+        <tr>
+            <td>GPQA (0-shot, COT)</td>
+            <td>9.6</td>
+            <td>13.8</td>
+            <td><b>22.3</b></td>
         </tr>
         <tr>
             <td>MUSR (0-shot)</td>
+            <td>38.6</td>
+            <td>40.7</td>
+            <td><b>46.4</b></td>
         </tr>
         <tr>
             <td>BBH (3-shot)</td>
+            <td>43.7</td>
+            <td><b>53.9</b></td>
+            <td>52.4</td>
+        </tr>
+        <tr>
+            <td>BBH (3-shot, COT)</td>
+            <td>6.7</td>
+            <td>21.2</td>
+            <td><b>69.3</b></td>
         </tr>
         <tr>
             <td rowspan="4">CommonSense Understanding</td>
             <td>PIQA (0-shot)</td>
+            <td><b>78.9</b></td>
+            <td>73.7</td>
+            <td>78.8</td>
         </tr>
         <tr>
             <td>SciQ (0-shot)</td>
+            <td>80.2</td>
+            <td>50.9</td>
+            <td><b>94.7</b></td>
         </tr>
         <tr>
             <td>Winogrande (0-shot)</td>
+            <td>TODO</td>
+            <td>TODO</td>
+            <td>70.4</td>
         </tr>
         <tr>
             <td>OpenbookQA (0-shot)</td>
+            <td><b>46.2</b></td>
+            <td>42.4</td>
+            <td>45.8</td>
+        </tr>
+        <tr>
+            <td rowspan="2">Instructions following</td>
+            <td>MT-Bench (avg)</td>
+            <td>7.86</td>
+            <td><b>8.54</b></td>
+            <td>8.36</td>
+        </tr>
+        <tr>
+            <td>Alapaca (WC)</td>
+            <td>26.57</td>
+            <td><b>31.5</b></td>
+            <td>26.13</td>
         </tr>
     </tbody>
 </table>