Files changed (1) hide show
  1. README.md +87 -79
README.md CHANGED
@@ -2,38 +2,27 @@
2
  language:
3
  - en
4
  tags:
5
- - falcon3
6
- - falcon3_mamba
7
- - falcon_mamba
8
  base_model:
9
  - tiiuae/Falcon3-Mamba-7B-Base
10
- license: other
11
- license_name: falcon-llm-license
12
- license_link: https://falconllm.tii.ae/falcon-terms-and-conditions.html
13
- library_name: transformers
14
  ---
15
 
16
- <div align="center">
17
- <img src="https://huggingface.co/datasets/tiiuae/documentation-images/resolve/main/falcon_mamba/falcon-mamba-logo.png" alt="drawing" width="500"/>
18
- </div>
19
-
20
-
21
  # Falcon3-Mamba-7B-Instruct
22
 
23
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
24
 
25
- This repository contains the **Falcon3-Mamba-7B-Instruct**. It achieves, compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
26
- Falcon3-Mamba-7B-Instruct supports a context length up to 32K and was mainly trained on english corpus.
27
 
28
  ## Model Details
29
- - Architecture (same as [Falcon-Mamba-7b](https://huggingface.co/tiiuae/falcon-mamba-7b))
30
  - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token).
31
  - 64 decoder blocks
32
  - width: 4096
33
  - state_size: 16
34
  - 32k context length
35
  - 65k vocab size
36
- - Continue Pretrained from [Falcon Mamba 7B](https://huggingface.co/tiiuae/falcon-mamba-7b), with another 1500 Gigatokens of data comprising of web, code, STEM and high quality data.
37
  - Postrained on 1.2 million samples of STEM, conversations, code, and safety.
38
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
39
  - License: TII Falcon-LLM License 2.0
@@ -89,7 +78,7 @@ print(response)
89
  <br>
90
 
91
  # Benchmarks
92
- We report in the following table our internal pipeline benchmarks. For the benchmarks marked by star, we normalize the results with HuggingFace score normalization:
93
 
94
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
95
  <colgroup>
@@ -98,6 +87,7 @@ We report in the following table our internal pipeline benchmarks. For the bench
98
  <col style="width: 7%;">
99
  <col style="width: 7%;">
100
  <col style="width: 7%;">
 
101
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
102
  </colgroup>
103
  <thead>
@@ -105,114 +95,132 @@ We report in the following table our internal pipeline benchmarks. For the bench
105
  <th>Category</th>
106
  <th>Benchmark</th>
107
  <th>Zamba2-7B-instruct</th>
108
- <th>Jamba-1.5-Mini</th>
 
109
  <th>Llama-3.1-8B-Instruct</th>
110
- <th>Falcon3-Mamba-7B-Instruct</th>
111
  </tr>
112
  </thead>
113
  <tbody>
114
  <tr>
115
  <td rowspan="3">General</td>
116
  <td>MMLU (5-shot)</td>
117
- <td>30.6</td>
118
- <td>68.7</td>
119
- <td>55.9</td>
120
- <td>65.3</td>
 
121
  </tr>
122
  <tr>
123
- <td>MMLU-PRO (5-shot)*</td>
124
- <td>32.4</td>
125
- <td>31.6</td>
126
- <td>21.8</td>
127
- <td>26.3</td>
 
128
  </tr>
129
  <tr>
130
  <td>IFEval</td>
131
- <td>69.9</td>
132
- <td>65.7</td>
133
- <td>78.8</td>
134
- <td>71.7</td>
 
135
  </tr>
136
  <tr>
137
  <td rowspan="2">Math</td>
138
  <td>GSM8K (5-shot)</td>
139
- <td>0</td>
140
- <td>74.9</td>
141
- <td>19.2</td>
142
- <td>65.2</td>
 
143
  </tr>
144
  <tr>
145
- <td>MATH Lvl-5 (4-shot)</td>
146
- <td>13.6</td>
147
- <td>6.9</td>
148
- <td>10.4</td>
149
- <td>27.3</td>
 
150
  </tr>
151
  <tr>
152
  <td rowspan="4">Reasoning</td>
153
  <td>Arc Challenge (25-shot)</td>
154
- <td>54</td>
155
- <td>54.3</td>
156
- <td>46.6</td>
157
- <td>53.7</td>
 
158
  </tr>
159
  <tr>
160
- <td>GPQA (0-shot)*</td>
161
- <td>10.3</td>
162
- <td>11.1</td>
163
- <td>6.2</td>
164
- <td>7.2</td>
 
165
  </tr>
166
  <tr>
167
- <td>MUSR (0-shot)*</td>
168
- <td>8.2</td>
169
- <td>12.2</td>
170
- <td>38.6</td>
171
- <td>8.3</td>
 
172
  </tr>
173
  <tr>
174
- <td>BBH (3-shot)*</td>
175
- <td>33.3</td>
176
- <td>35.3</td>
177
- <td>43.7</td>
178
- <td>25.2</td>
 
179
  </tr>
180
  <tr>
181
  <td rowspan="4">CommonSense Understanding</td>
182
  <td>PIQA (0-shot)</td>
183
- <td>75.6</td>
184
- <td>82.3</td>
185
- <td>78.9</td>
186
- <td>80.9</td>
 
187
  </tr>
188
  <tr>
189
  <td>SciQ (0-shot)</td>
190
- <td>29.2</td>
191
- <td>94.9</td>
192
- <td>80.2</td>
193
- <td>93.6</td>
 
 
 
 
 
 
 
 
 
194
  </tr>
195
  <tr>
196
  <td>OpenbookQA (0-shot)</td>
197
- <td>45.6</td>
198
- <td>45.8</td>
199
- <td>46.2</td>
200
- <td>47.2</td>
 
201
  </tr>
202
  </tbody>
203
  </table>
204
 
205
- ## Useful links
206
- - View our [release blogpost](https://huggingface.co/blog/falcon3).
207
- - Feel free to join [our discord server](https://discord.gg/fwXpMyGc) if you have any questions or to interact with our researchers and developers.
208
 
209
- ## Citation
210
- If the Falcon3 family of models were helpful to your work, feel free to give us a cite.
211
 
212
  ```
213
  @misc{Falcon3,
214
- title = {The Falcon 3 Family of Open Models},
215
- author = {Falcon-LLM Team},
216
  month = {December},
217
  year = {2024}
218
  }
 
2
  language:
3
  - en
4
  tags:
5
+ - falcon3-Mamba-Instruct
 
 
6
  base_model:
7
  - tiiuae/Falcon3-Mamba-7B-Base
 
 
 
 
8
  ---
9
 
 
 
 
 
 
10
  # Falcon3-Mamba-7B-Instruct
11
 
12
  **Falcon3** family of Open Foundation Models is a set of pretrained and instruct LLMs ranging from 1B to 10B.
13
 
14
+ This repository contains the **Falcon3-Mamba-7B-Instruct**. It achieves ,compared to similar SSM-based models of the same size, state of art results (at release's time) on reasoning, language understanding, instruction following, code and mathematics tasks.
15
+ Falcon3-Mamba-7B-Instruct supports a context length up to 32K and 1 language (english).
16
 
17
  ## Model Details
18
+ - Architecture(same as Falcon-Mamba-7b)
19
  - Mamba1 based causal decoder only architecture trained on a causal language modeling task (i.e., predict the next token).
20
  - 64 decoder blocks
21
  - width: 4096
22
  - state_size: 16
23
  - 32k context length
24
  - 65k vocab size
25
+ - Pretrained on 7 Teratokens of datasets comprising of web, code, STEM and high quality data using 2048 H100 GPU chips
26
  - Postrained on 1.2 million samples of STEM, conversations, code, and safety.
27
  - Developed by [Technology Innovation Institute](https://www.tii.ae)
28
  - License: TII Falcon-LLM License 2.0
 
78
  <br>
79
 
80
  # Benchmarks
81
+ We report in the following table our internal pipeline benchmarks:
82
 
83
  <table border="1" style="width: 100%; text-align: center; border-collapse: collapse;">
84
  <colgroup>
 
87
  <col style="width: 7%;">
88
  <col style="width: 7%;">
89
  <col style="width: 7%;">
90
+ <col style="width: 7%;">
91
  <col style="background-color: rgba(80, 15, 213, 0.5); width: 7%;">
92
  </colgroup>
93
  <thead>
 
95
  <th>Category</th>
96
  <th>Benchmark</th>
97
  <th>Zamba2-7B-instruct</th>
98
+ <th>Jamba-1.5-Mini-instruct</th>
99
+ <th>falcon-mamba-7b-instruct</th>
100
  <th>Llama-3.1-8B-Instruct</th>
101
+ <th>Qwen2-7B-Instruct</th>
102
  </tr>
103
  </thead>
104
  <tbody>
105
  <tr>
106
  <td rowspan="3">General</td>
107
  <td>MMLU (5-shot)</td>
108
+ <td>-</td>
109
+ <td>-</td>
110
+ <td>-</td>
111
+ <td>-</68.5%>
112
+ <td>-</td>
113
  </tr>
114
  <tr>
115
+ <td>MMLU-PRO (5-shot)</td>
116
+ <td>-</td>
117
+ <td>-</td>
118
+ <td>-</td>
119
+ <td>-</29.6%>
120
+ <td>-</td>
121
  </tr>
122
  <tr>
123
  <td>IFEval</td>
124
+ <td>-</td>
125
+ <td>-</td>
126
+ <td>-</td>
127
+ <td>-</78.6%>
128
+ <td>-</td>
129
  </tr>
130
  <tr>
131
  <td rowspan="2">Math</td>
132
  <td>GSM8K (5-shot)</td>
133
+ <td>-</td>
134
+ <td>-</td>
135
+ <td>-</td>
136
+ <td>-</td>
137
+ <td>-</td>
138
  </tr>
139
  <tr>
140
+ <td>MATH(4-shot)</td>
141
+ <td>-</td>
142
+ <td>-</td>
143
+ <td>-</td>
144
+ <td>-</td>
145
+ <td>-</td>
146
  </tr>
147
  <tr>
148
  <td rowspan="4">Reasoning</td>
149
  <td>Arc Challenge (25-shot)</td>
150
+ <td>-</td>
151
+ <td>-</td>
152
+ <td>-</td>
153
+ <td>-</td>
154
+ <td>-</td>
155
  </tr>
156
  <tr>
157
+ <td>GPQA (0-shot)</td>
158
+ <td>-</td>
159
+ <td>-</td>
160
+ <td>-</td>
161
+ <td>-</2.4%>
162
+ <td>-</td>
163
  </tr>
164
  <tr>
165
+ <td>MUSR (0-shot)</td>
166
+ <td>-</td>
167
+ <td>-</td>
168
+ <td>-</td>
169
+ <td>-</8.4%>
170
+ <td>-</td>
171
  </tr>
172
  <tr>
173
+ <td>BBH (3-shot)</td>
174
+ <td>-</td>
175
+ <td>-</td>
176
+ <td>-</td>
177
+ <td>-</29.9%>
178
+ <td>-</td>
179
  </tr>
180
  <tr>
181
  <td rowspan="4">CommonSense Understanding</td>
182
  <td>PIQA (0-shot)</td>
183
+ <td>-</td>
184
+ <td>-</td>
185
+ <td>-</td>
186
+ <td>-</td>
187
+ <td>-</td>
188
  </tr>
189
  <tr>
190
  <td>SciQ (0-shot)</td>
191
+ <td>-</td>
192
+ <td>-</td>
193
+ <td>-</td>
194
+ <td>-</td>
195
+ <td>-</td>
196
+ </tr>
197
+ <tr>
198
+ <td>Winogrande (0-shot)</td>
199
+ <td>-</td>
200
+ <td>-</td>
201
+ <td>-</td>
202
+ <td>-</td>
203
+ <td>-</td>
204
  </tr>
205
  <tr>
206
  <td>OpenbookQA (0-shot)</td>
207
+ <td>-</td>
208
+ <td>-</td>
209
+ <td>-</td>
210
+ <td>-</td>
211
+ <td>-</td>
212
  </tr>
213
  </tbody>
214
  </table>
215
 
 
 
 
216
 
217
+ # Citation
218
+ If Falcon3 family were helpful to your work, feel free to give us a cite.
219
 
220
  ```
221
  @misc{Falcon3,
222
+ title = {The Falcon 3 family of Open Models},
223
+ author = {TII Team},
224
  month = {December},
225
  year = {2024}
226
  }