amr-mohamed
commited on
Updated README
Browse files
README.md
CHANGED
@@ -14,14 +14,15 @@ base_model:
|
|
14 |
---
|
15 |
|
16 |
|
17 |
-
# Atlas-Chat
|
18 |
|
19 |
|
20 |
## Model Overview
|
21 |
|
22 |
-
Atlas-Chat is a family of open models instruction-tuned for Darija, the colloquial Arabic of Morocco, developed as part of the [Jais](https://arxiv.org/abs/2308.16149) project for standard Arabic and its extentions to dialectal Arabic. These models are designed for language generation and excel in various applications such as question answering, summarization, and translation. Thanks to their compact size, Atlas-Chat models can be deployed in resource-constrained environments like laptops, desktops, or personal cloud setups, making advanced AI accessible to Darija speakers and promoting widespread innovation.
|
23 |
* [Atlas-Chat-2B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B): A small-sized version with 2 billion parameters, capable of generating fluent Moroccan Darija text while maintaining efficiency.
|
24 |
-
* [Atlas-Chat-9B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B): A
|
|
|
25 |
|
26 |
The models are designed to assist with:
|
27 |
|
@@ -352,6 +353,7 @@ Our training dataset [Darija-SFT-Mixture](https://huggingface.co/datasets/MBZUAI
|
|
352 |
Atlas-Chat models are based on Gemma 2 models. The Atlas-Chat models were trained using 8 Nvidia's A100 80 GB GPUs in parallel using FSDP on AWS Sagemaker. The model is trained using HuggingFace transformers and parameter-efficient fine-tuning with LoRA rank of 256.
|
353 |
|
354 |
|
|
|
355 |
## Evaluation
|
356 |
The Atlas-Chat models were evaluated on a comprehensive suite of tasks using various datasets and benchmarks to assess their performance across multiple dimensions. These included tasks such as:
|
357 |
|
@@ -360,6 +362,7 @@ The Atlas-Chat models were evaluated on a comprehensive suite of tasks using var
|
|
360 |
* **Belebele Ary_Arab:** Belebele is a multiple-choice machine reading comprehension dataset published by Facebook spanning 122 language variants. The Evaluation is done on the Ary_Arab part of Belebele that refers to Darija.
|
361 |
* **Sentiment Analysis.**
|
362 |
* **Translation:** Including six directions and four languages: Darija, MSA, English and French.
|
|
|
363 |
* **Summarization.**
|
364 |
|
365 |
The models were compared against a collection of existing open-source Arabic models to gauge their effectiveness, with a particular focus on performance in Darija. All scores are based on zero-shot performance. The prompts are written mainly in Darija. The metric used for DarijaMMLU, DarijaHellaSwag, Belebele Ary and Sentiment Analysis is the normalized accuracy. We used [Language Model Evaluation Harness](https://github.com/MBZUAI-Paris/lm-evaluation-harness-atlas-chat) to conduct these evaluations.
|
@@ -371,12 +374,24 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
371 |
<td rowspan="2"><a href="MBZUAI-Paris/DarijaHellaSwag" target="_blank">DarijaHellaSwag</a></td>
|
372 |
<td rowspan="2"><a href="https://huggingface.co/datasets/facebook/belebele/viewer/ary_Arab" target="_blank">Belebele Ary</a></td>
|
373 |
<td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">Sentiment Analysis</a></td>
|
374 |
-
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">
|
|
|
|
|
|
|
|
|
375 |
<td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MArSum (Summarization)</a><br/>(LLM as a judge)</td>
|
376 |
</tr>
|
377 |
<tr>
|
378 |
<td>BLEU</td>
|
379 |
<td>chrF</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
380 |
</tr>
|
381 |
<tr>
|
382 |
<td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
|
@@ -387,6 +402,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
387 |
<td>00.13</td>
|
388 |
<td>06.18</td>
|
389 |
<td>00.50</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
390 |
</tr>
|
391 |
<tr>
|
392 |
<td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
|
@@ -396,6 +419,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
396 |
<td>51.56</td>
|
397 |
<td>00.25</td>
|
398 |
<td>07.46</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
399 |
<td>00.90</td>
|
400 |
</tr>
|
401 |
<tr>
|
@@ -406,8 +437,52 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
406 |
<td>53.36</td>
|
407 |
<td>00.10</td>
|
408 |
<td>04.96</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
409 |
<td>06.80</td>
|
410 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
411 |
<tr>
|
412 |
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
|
413 |
<td><b>44.97</td>
|
@@ -416,6 +491,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
416 |
<td><b>73.99</td>
|
417 |
<td><b>22.76</td>
|
418 |
<td><b>44.86</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
419 |
<td><b>55.22</td>
|
420 |
</tr>
|
421 |
<tr style="border-top: 4px solid;"></tr>
|
@@ -427,6 +510,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
427 |
<td>56.78</td>
|
428 |
<td>00.73</td>
|
429 |
<td>11.85</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
430 |
<td>03.02</td>
|
431 |
</tr>
|
432 |
<tr>
|
@@ -437,6 +528,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
437 |
<td>52.72</td>
|
438 |
<td>00.60</td>
|
439 |
<td>09.43</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
440 |
<td>02.82</td>
|
441 |
</tr>
|
442 |
<tr>
|
@@ -447,6 +546,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
447 |
<td>41.73</td>
|
448 |
<td>00.92</td>
|
449 |
<td>11.71</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
450 |
<td>01.77</td>
|
451 |
</tr>
|
452 |
<tr>
|
@@ -457,6 +564,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
457 |
<td>66.68</td>
|
458 |
<td>00.87</td>
|
459 |
<td>10.52</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
460 |
<td>01.92</td>
|
461 |
</tr>
|
462 |
<tr>
|
@@ -467,6 +582,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
467 |
<td>40.23</td>
|
468 |
<td>00.44</td>
|
469 |
<td>11.33</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
470 |
<td>02.28</td>
|
471 |
</tr>
|
472 |
<tr>
|
@@ -477,6 +600,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
477 |
<td>59.58</td>
|
478 |
<td>00.98</td>
|
479 |
<td>16.70</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
480 |
<td>02.80</td>
|
481 |
</tr>
|
482 |
<tr>
|
@@ -487,6 +618,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
487 |
<td>59.87</td>
|
488 |
<td>03.10</td>
|
489 |
<td>19.16</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
490 |
<td>13.81</td>
|
491 |
</tr>
|
492 |
<tr>
|
@@ -497,6 +636,14 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
497 |
<td>44.08</td>
|
498 |
<td>00.92</td>
|
499 |
<td>14.19</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
500 |
<td>01.28</td>
|
501 |
</tr>
|
502 |
<tr>
|
@@ -507,13 +654,530 @@ The models were compared against a collection of existing open-source Arabic mod
|
|
507 |
<td><b>81.89</td>
|
508 |
<td><b>28.08</td>
|
509 |
<td><b>50.48</td>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
510 |
<td><b>59.76</td>
|
511 |
</tr>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
512 |
|
513 |
|
514 |
|
515 |
</table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
516 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
517 |
|
518 |
## Usage and Limitations
|
519 |
|
|
|
14 |
---
|
15 |
|
16 |
|
17 |
+
# JAIS Intiative: Atlas-Chat Models
|
18 |
|
19 |
|
20 |
## Model Overview
|
21 |
|
22 |
+
Atlas-Chat is a family of open models instruction-tuned for Darija, the colloquial Arabic of Morocco, developed as part of the [Jais](https://arxiv.org/abs/2308.16149) project for standard Arabic and its extentions to dialectal Arabic. These models are designed for language generation and excel in various applications such as question answering, summarization, and translation. Thanks to their compact size, Atlas-Chat models can be deployed in resource-constrained environments like laptops, desktops, or personal cloud setups, making advanced AI accessible to Darija speakers and promoting widespread innovation. Three sizes are available:
|
23 |
* [Atlas-Chat-2B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B): A small-sized version with 2 billion parameters, capable of generating fluent Moroccan Darija text while maintaining efficiency.
|
24 |
+
* [Atlas-Chat-9B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B): A medium-sized with 9 billion parameters, providing more nuanced, contextually rich language generation for complex tasks.
|
25 |
+
* [Atlas-Chat-27B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B): A large-sized version with 27 billion parameters, offering even more advanced capabilities for complex tasks and nuanced language generation compared to the 2B and 9B versions.
|
26 |
|
27 |
The models are designed to assist with:
|
28 |
|
|
|
353 |
Atlas-Chat models are based on Gemma 2 models. The Atlas-Chat models were trained using 8 Nvidia's A100 80 GB GPUs in parallel using FSDP on AWS Sagemaker. The model is trained using HuggingFace transformers and parameter-efficient fine-tuning with LoRA rank of 256.
|
354 |
|
355 |
|
356 |
+
<!--
|
357 |
## Evaluation
|
358 |
The Atlas-Chat models were evaluated on a comprehensive suite of tasks using various datasets and benchmarks to assess their performance across multiple dimensions. These included tasks such as:
|
359 |
|
|
|
362 |
* **Belebele Ary_Arab:** Belebele is a multiple-choice machine reading comprehension dataset published by Facebook spanning 122 language variants. The Evaluation is done on the Ary_Arab part of Belebele that refers to Darija.
|
363 |
* **Sentiment Analysis.**
|
364 |
* **Translation:** Including six directions and four languages: Darija, MSA, English and French.
|
365 |
+
* **Transliteration:** Transforming a sentence from Darija (written in Arabic characters) to Arabizi (Written in Latin characters) and vice-versa.
|
366 |
* **Summarization.**
|
367 |
|
368 |
The models were compared against a collection of existing open-source Arabic models to gauge their effectiveness, with a particular focus on performance in Darija. All scores are based on zero-shot performance. The prompts are written mainly in Darija. The metric used for DarijaMMLU, DarijaHellaSwag, Belebele Ary and Sentiment Analysis is the normalized accuracy. We used [Language Model Evaluation Harness](https://github.com/MBZUAI-Paris/lm-evaluation-harness-atlas-chat) to conduct these evaluations.
|
|
|
374 |
<td rowspan="2"><a href="MBZUAI-Paris/DarijaHellaSwag" target="_blank">DarijaHellaSwag</a></td>
|
375 |
<td rowspan="2"><a href="https://huggingface.co/datasets/facebook/belebele/viewer/ary_Arab" target="_blank">Belebele Ary</a></td>
|
376 |
<td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">Sentiment Analysis</a></td>
|
377 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Translation)</a></td>
|
378 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MADAR (Translation)</a></td>
|
379 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">FLORES+ (Translation)</a></td>
|
380 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">NLLB-Seed (Translation)</a></td>
|
381 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Transliteration)</a></td>
|
382 |
<td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MArSum (Summarization)</a><br/>(LLM as a judge)</td>
|
383 |
</tr>
|
384 |
<tr>
|
385 |
<td>BLEU</td>
|
386 |
<td>chrF</td>
|
387 |
+
<td>BLEU</td>
|
388 |
+
<td>chrF</td>
|
389 |
+
<td>BLEU</td>
|
390 |
+
<td>chrF</td>
|
391 |
+
<td>BLEU</td>
|
392 |
+
<td>chrF</td>
|
393 |
+
<td>BLEU</td>
|
394 |
+
<td>chrF</td>
|
395 |
</tr>
|
396 |
<tr>
|
397 |
<td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
|
|
|
402 |
<td>00.13</td>
|
403 |
<td>06.18</td>
|
404 |
<td>00.50</td>
|
405 |
+
<td>15.43</td>
|
406 |
+
<td>02.44</td>
|
407 |
+
<td>19.14</td>
|
408 |
+
<td>01.99</td>
|
409 |
+
<td>12.60</td>
|
410 |
+
<td>00.01</td>
|
411 |
+
<td>03.01</td>
|
412 |
+
<td>00.50</td>
|
413 |
</tr>
|
414 |
<tr>
|
415 |
<td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
|
|
|
419 |
<td>51.56</td>
|
420 |
<td>00.25</td>
|
421 |
<td>07.46</td>
|
422 |
+
<td>00.62</td>
|
423 |
+
<td>16.36</td>
|
424 |
+
<td>04.25</td>
|
425 |
+
<td>18.22</td>
|
426 |
+
<td>03.10</td>
|
427 |
+
<td>08.19</td>
|
428 |
+
<td>00.01</td>
|
429 |
+
<td>03.27</td>
|
430 |
<td>00.90</td>
|
431 |
</tr>
|
432 |
<tr>
|
|
|
437 |
<td>53.36</td>
|
438 |
<td>00.10</td>
|
439 |
<td>04.96</td>
|
440 |
+
<td>00.12</td>
|
441 |
+
<td>06.66</td>
|
442 |
+
<td>01.55</td>
|
443 |
+
<td>18.59</td>
|
444 |
+
<td>02.78</td>
|
445 |
+
<td>23.69</td>
|
446 |
+
<td>00.01</td>
|
447 |
+
<td>02.08</td>
|
448 |
<td>06.80</td>
|
449 |
</tr>
|
450 |
+
<tr>
|
451 |
+
<td><a href="meta-llama/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a></td>
|
452 |
+
<td>27.66</td>
|
453 |
+
<td>26.88</td>
|
454 |
+
<td>28.89</td>
|
455 |
+
<td>46.27</td>
|
456 |
+
<td>00.07</td>
|
457 |
+
<td>05.95</td>
|
458 |
+
<td>00.80</td>
|
459 |
+
<td>18.71</td>
|
460 |
+
<td>04.53</td>
|
461 |
+
<td>18.39</td>
|
462 |
+
<td>04.52</td>
|
463 |
+
<td>17.06</td>
|
464 |
+
<td>00.02</td>
|
465 |
+
<td>03.74</td>
|
466 |
+
<td>08.23</td>
|
467 |
+
</tr>
|
468 |
+
<tr>
|
469 |
+
<td><a href="meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama-3.2-3B-Instruct</a></td>
|
470 |
+
<td>32.60</td>
|
471 |
+
<td>28.33</td>
|
472 |
+
<td>38.00</td>
|
473 |
+
<td>49.20</td>
|
474 |
+
<td>00.62</td>
|
475 |
+
<td>13.67</td>
|
476 |
+
<td>01.18</td>
|
477 |
+
<td>22.12</td>
|
478 |
+
<td>08.59</td>
|
479 |
+
<td>35.21</td>
|
480 |
+
<td>13.75</td>
|
481 |
+
<td>43.63</td>
|
482 |
+
<td>00.21</td>
|
483 |
+
<td>09.68</td>
|
484 |
+
<td>08.23</td>
|
485 |
+
</tr>
|
486 |
<tr>
|
487 |
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
|
488 |
<td><b>44.97</td>
|
|
|
491 |
<td><b>73.99</td>
|
492 |
<td><b>22.76</td>
|
493 |
<td><b>44.86</td>
|
494 |
+
<td><b>16.67</td>
|
495 |
+
<td><b>41.64</td>
|
496 |
+
<td><b>14.92</td>
|
497 |
+
<td><b>43.03</td>
|
498 |
+
<td><b>23.88</td>
|
499 |
+
<td><b>52.19</td>
|
500 |
+
<td><b>08.18</td>
|
501 |
+
<td><b>21.54</td>
|
502 |
<td><b>55.22</td>
|
503 |
</tr>
|
504 |
<tr style="border-top: 4px solid;"></tr>
|
|
|
510 |
<td>56.78</td>
|
511 |
<td>00.73</td>
|
512 |
<td>11.85</td>
|
513 |
+
<td>01.88</td>
|
514 |
+
<td>23.22</td>
|
515 |
+
<td>04.25</td>
|
516 |
+
<td>18.22</td>
|
517 |
+
<td>04.62</td>
|
518 |
+
<td>20.22</td>
|
519 |
+
<td>00.02</td>
|
520 |
+
<td>03.79</td>
|
521 |
<td>03.02</td>
|
522 |
</tr>
|
523 |
<tr>
|
|
|
528 |
<td>52.72</td>
|
529 |
<td>00.60</td>
|
530 |
<td>09.43</td>
|
531 |
+
<td>03.45</td>
|
532 |
+
<td>25.88</td>
|
533 |
+
<td>07.25</td>
|
534 |
+
<td>23.21</td>
|
535 |
+
<td>01.25</td>
|
536 |
+
<td>02.22</td>
|
537 |
+
<td>00.04</td>
|
538 |
+
<td>03.24</td>
|
539 |
<td>02.82</td>
|
540 |
</tr>
|
541 |
<tr>
|
|
|
546 |
<td>41.73</td>
|
547 |
<td>00.92</td>
|
548 |
<td>11.71</td>
|
549 |
+
<td>04.01</td>
|
550 |
+
<td>28.48</td>
|
551 |
+
<td>05.70</td>
|
552 |
+
<td>27.24</td>
|
553 |
+
<td>04.50</td>
|
554 |
+
<td>22.56</td>
|
555 |
+
<td>00.03</td>
|
556 |
+
<td>03.57</td>
|
557 |
<td>01.77</td>
|
558 |
</tr>
|
559 |
<tr>
|
|
|
564 |
<td>66.68</td>
|
565 |
<td>00.87</td>
|
566 |
<td>10.52</td>
|
567 |
+
<td>04.02</td>
|
568 |
+
<td>25.29</td>
|
569 |
+
<td>06.66</td>
|
570 |
+
<td>23.46</td>
|
571 |
+
<td>20.14</td>
|
572 |
+
<td>47.87</td>
|
573 |
+
<td>0.04</td>
|
574 |
+
<td>04.77</td>
|
575 |
<td>01.92</td>
|
576 |
</tr>
|
577 |
<tr>
|
|
|
582 |
<td>40.23</td>
|
583 |
<td>00.44</td>
|
584 |
<td>11.33</td>
|
585 |
+
<td>01.05</td>
|
586 |
+
<td>19.24</td>
|
587 |
+
<td>06.92</td>
|
588 |
+
<td>36.03</td>
|
589 |
+
<td>11.05</td>
|
590 |
+
<td>44.55</td>
|
591 |
+
<td>00.06</td>
|
592 |
+
<td>04.74</td>
|
593 |
<td>02.28</td>
|
594 |
</tr>
|
595 |
<tr>
|
|
|
600 |
<td>59.58</td>
|
601 |
<td>00.98</td>
|
602 |
<td>16.70</td>
|
603 |
+
<td>00.81</td>
|
604 |
+
<td>20.23</td>
|
605 |
+
<td>08.73</td>
|
606 |
+
<td>40.76</td>
|
607 |
+
<td>14.02</td>
|
608 |
+
<td>48.28</td>
|
609 |
+
<td>00.12</td>
|
610 |
+
<td>06.32</td>
|
611 |
<td>02.80</td>
|
612 |
</tr>
|
613 |
<tr>
|
|
|
618 |
<td>59.87</td>
|
619 |
<td>03.10</td>
|
620 |
<td>19.16</td>
|
621 |
+
<td>01.72</td>
|
622 |
+
<td>24.35</td>
|
623 |
+
<td>05.18</td>
|
624 |
+
<td>36.96</td>
|
625 |
+
<td>08.23</td>
|
626 |
+
<td>43.57</td>
|
627 |
+
<td>00.17</td>
|
628 |
+
<td>09.14</td>
|
629 |
<td>13.81</td>
|
630 |
</tr>
|
631 |
<tr>
|
|
|
636 |
<td>44.08</td>
|
637 |
<td>00.92</td>
|
638 |
<td>14.19</td>
|
639 |
+
<td>01.46</td>
|
640 |
+
<td>23.82</td>
|
641 |
+
<td>08.89</td>
|
642 |
+
<td>33.08</td>
|
643 |
+
<td>11.85</td>
|
644 |
+
<td>35.51</td>
|
645 |
+
<td>00.11</td>
|
646 |
+
<td>06.02</td>
|
647 |
<td>01.28</td>
|
648 |
</tr>
|
649 |
<tr>
|
|
|
654 |
<td><b>81.89</td>
|
655 |
<td><b>28.08</td>
|
656 |
<td><b>50.48</td>
|
657 |
+
<td><b>18.16</td>
|
658 |
+
<td><b>43.91</td>
|
659 |
+
<td><b>18.63</td>
|
660 |
+
<td><b>47.53</td>
|
661 |
+
<td><b>29.98</td>
|
662 |
+
<td><b>58.26</td>
|
663 |
+
<td><b>22.08</td>
|
664 |
+
<td><b>34.17</td>
|
665 |
<td><b>59.76</td>
|
666 |
</tr>
|
667 |
+
<tr style="border-top: 4px solid;"></tr>
|
668 |
+
<tr>
|
669 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-30b-8k-chat" target="_blank">jais-family-30b-8k-chat</a></td>
|
670 |
+
<td>51.88</td>
|
671 |
+
<td>35.61</td>
|
672 |
+
<td>65.67</td>
|
673 |
+
<td>56.73</td>
|
674 |
+
<td>01.10</td>
|
675 |
+
<td>14.40</td>
|
676 |
+
<td>01.67</td>
|
677 |
+
<td>23.37</td>
|
678 |
+
<td>08.52</td>
|
679 |
+
<td>35.41</td>
|
680 |
+
<td>13.71</td>
|
681 |
+
<td>41.33</td>
|
682 |
+
<td>00.05</td>
|
683 |
+
<td>04.48</td>
|
684 |
+
<td>00.46</td>
|
685 |
+
</tr>
|
686 |
+
<tr>
|
687 |
+
<td><a href="https://huggingface.co/google/gemma-2-27b-it" target="_blank">gemma-2-27b-it</a></td>
|
688 |
+
<td>36.47</td>
|
689 |
+
<td>37.04</td>
|
690 |
+
<td>35.78</td>
|
691 |
+
<td>57.59</td>
|
692 |
+
<td>00.67</td>
|
693 |
+
<td>13.04</td>
|
694 |
+
<td>01.74</td>
|
695 |
+
<td>24.63</td>
|
696 |
+
<td>05.17</td>
|
697 |
+
<td>37.08</td>
|
698 |
+
<td>07.36</td>
|
699 |
+
<td>42.49</td>
|
700 |
+
<td>00.03</td>
|
701 |
+
<td>04.94</td>
|
702 |
+
<td>11.10</td>
|
703 |
+
</tr>
|
704 |
+
<tr>
|
705 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B" target="_blank">Atlas-Chat-27B</a></strong></td>
|
706 |
+
<td><b>61.95</td>
|
707 |
+
<td><b>48.37</td>
|
708 |
+
<td><b>75.67</td>
|
709 |
+
<td>73.00</td>
|
710 |
+
<td><b>29.55</td>
|
711 |
+
<td><b>51.74</td>
|
712 |
+
<td><b>19.66</td>
|
713 |
+
<td><b>45.65</td>
|
714 |
+
<td><b>20.34</td>
|
715 |
+
<td><b>49.19</td>
|
716 |
+
<td><b>31.61</td>
|
717 |
+
<td><b>59.37</td>
|
718 |
+
<td><b>33.03</td>
|
719 |
+
<td><b>40.95</td>
|
720 |
+
<td><b>60.70</td>
|
721 |
+
</tr>
|
722 |
|
723 |
|
724 |
|
725 |
</table>
|
726 |
+
-->
|
727 |
+
|
728 |
+
## Evaluation
|
729 |
+
The Atlas-Chat models were evaluated on a comprehensive suite of tasks using various datasets and benchmarks to assess their performance across multiple dimensions. These included tasks such as:
|
730 |
+
|
731 |
+
* **DarijaMMLU:** A Darija version of ArabicMMLU and MMLU benchmarks translated from MSA and English respectively.
|
732 |
+
* **DarijaHellaSwag:** A Darija version of HellaSwag.
|
733 |
+
* **Belebele Ary_Arab:** Belebele is a multiple-choice machine reading comprehension dataset published by Facebook spanning 122 language variants. The Evaluation is done on the Ary_Arab part of Belebele that refers to Darija.
|
734 |
+
* **DarijaAlpacaEval:** A Darija version of AlpacaEval translated to Darija and adapted to the Moroccan culture.
|
735 |
+
* **Sentiment Analysis.**
|
736 |
+
* **Translation:** Including six directions and four languages: Darija, MSA, English and French.
|
737 |
+
* **Transliteration:** Transforming a sentence from Darija (written in Arabic characters) to Arabizi (Written in Latin characters) and vice-versa.
|
738 |
+
* **Summarization.**
|
739 |
|
740 |
+
The models were compared against a collection of existing open-source Arabic models to gauge their effectiveness, with a particular focus on performance in Darija. All scores are based on zero-shot performance. The prompts are written mainly in Darija. The metric used for DarijaMMLU, DarijaHellaSwag, Belebele Ary and Sentiment Analysis is the normalized accuracy. We used [Language Model Evaluation Harness](https://github.com/MBZUAI-Paris/lm-evaluation-harness-atlas-chat) to conduct these evaluations.
|
741 |
+
|
742 |
+
**LLMs Benchmarks:**
|
743 |
+
<table>
|
744 |
+
<tr>
|
745 |
+
<td>Model</td>
|
746 |
+
<td><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaMMLU" target="_blank">DarijaMMLU</a></td>
|
747 |
+
<td><a href="MBZUAI-Paris/DarijaHellaSwag" target="_blank">DarijaHellaSwag</a></td>
|
748 |
+
<td ><a href="https://huggingface.co/datasets/facebook/belebele/viewer/ary_Arab" target="_blank">Belebele Ary</a></td>
|
749 |
+
<td ><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaAlpacaEval" target="_blank">DarijaAlpacaEval</a></td>
|
750 |
+
</tr>
|
751 |
+
<tr>
|
752 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
|
753 |
+
<td>35.39</td>
|
754 |
+
<td>32.51</td>
|
755 |
+
<td>38.33</td>
|
756 |
+
<td>35.56</td>
|
757 |
+
</tr>
|
758 |
+
<tr>
|
759 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
|
760 |
+
<td>37.44</td>
|
761 |
+
<td>34.49</td>
|
762 |
+
<td>44.11</td>
|
763 |
+
<td>52.97</td>
|
764 |
+
</tr>
|
765 |
+
<tr>
|
766 |
+
<td><a href="https://huggingface.co/google/gemma-2-2b-it" target="_blank">gemma-2-2b-it</a></td>
|
767 |
+
<td>28.58</td>
|
768 |
+
<td>32.42</td>
|
769 |
+
<td>25.22</td>
|
770 |
+
<td>58.67</td>
|
771 |
+
</tr>
|
772 |
+
<tr>
|
773 |
+
<td><a href="meta-llama/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a></td>
|
774 |
+
<td>27.66</td>
|
775 |
+
<td>26.88</td>
|
776 |
+
<td>28.89</td>
|
777 |
+
<td>23.57</td>
|
778 |
+
</tr>
|
779 |
+
<tr>
|
780 |
+
<td><a href="meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama-3.2-3B-Instruct</a></td>
|
781 |
+
<td>32.60</td>
|
782 |
+
<td>28.33</td>
|
783 |
+
<td>38.00</td>
|
784 |
+
<td>47.62</td>
|
785 |
+
</tr>
|
786 |
+
<tr>
|
787 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
|
788 |
+
<td><b>44.97</b></td>
|
789 |
+
<td><b>41.48</b></td>
|
790 |
+
<td><b>53.89</b></td>
|
791 |
+
<td><b>92.31</b></td>
|
792 |
+
</tr>
|
793 |
+
<tr style="border-top: 4px solid;"></tr>
|
794 |
+
<tr>
|
795 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-6p7b-chat" target="_blank">jais-family-6p7b-chat</a></td>
|
796 |
+
<td>39.96</td>
|
797 |
+
<td>41.57</td>
|
798 |
+
<td>51.22</td>
|
799 |
+
<td>65.18</td>
|
800 |
+
</tr>
|
801 |
+
<tr>
|
802 |
+
<td><a href="https://huggingface.co/inceptionai/jais-adapted-7b-chat" target="_blank">jais-adapted-7b-chat</a></td>
|
803 |
+
<td>39.30</td>
|
804 |
+
<td>35.19</td>
|
805 |
+
<td>43.67</td>
|
806 |
+
<td>61.84</td>
|
807 |
+
</tr>
|
808 |
+
<tr>
|
809 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-13b-chat" target="_blank">jais-family-13b-chat</a></td>
|
810 |
+
<td>45.11</td>
|
811 |
+
<td>43.90</td>
|
812 |
+
<td>58.67</td>
|
813 |
+
<td>69.93</td>
|
814 |
+
</tr>
|
815 |
+
<tr>
|
816 |
+
<td><a href="https://huggingface.co/inceptionai/jais-adapted-13b-chat" target="_blank">jais-adapted-13b-chat</a></td>
|
817 |
+
<td>45.20</td>
|
818 |
+
<td>40.65</td>
|
819 |
+
<td>49.67</td>
|
820 |
+
<td>77.52</td>
|
821 |
+
</tr>
|
822 |
+
<tr>
|
823 |
+
<td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-7B-chat" target="_blank">AceGPT-7b-chat</a></td>
|
824 |
+
<td>35.98</td>
|
825 |
+
<td>36.57</td>
|
826 |
+
<td>30.11</td>
|
827 |
+
<td>47.31</td>
|
828 |
+
</tr>
|
829 |
+
<tr>
|
830 |
+
<td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat" target="_blank">AceGPT-13b-chat</a></td>
|
831 |
+
<td>41.09</td>
|
832 |
+
<td>38.35</td>
|
833 |
+
<td>33.11</td>
|
834 |
+
<td>52.79</td>
|
835 |
+
</tr>
|
836 |
+
<tr>
|
837 |
+
<td><a href="https://huggingface.co/google/gemma-2-9b-it" target="_blank">gemma-2-9b-it</a></td>
|
838 |
+
<td>35.91</td>
|
839 |
+
<td>42.43</td>
|
840 |
+
<td>31.00</td>
|
841 |
+
<td>90.86</td>
|
842 |
+
</tr>
|
843 |
+
<tr>
|
844 |
+
<td><a href="meta-llama/Meta-Llama-3.1-8B-Instruct" target="_blank">Llama-3.1-8B-Instruct</a></td>
|
845 |
+
<td>44.13</td>
|
846 |
+
<td>38.24</td>
|
847 |
+
<td>47.00</td>
|
848 |
+
<td>78.08</td>
|
849 |
+
</tr>
|
850 |
+
<tr>
|
851 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B" target="_blank">Atlas-Chat-9B</a></strong></td>
|
852 |
+
<td><b>58.23</b></td>
|
853 |
+
<td><b>57.75</b></td>
|
854 |
+
<td><b>74.56</b></td>
|
855 |
+
<td><b>95.62</b></td>
|
856 |
+
</tr>
|
857 |
+
<tr style="border-top: 4px solid;"></tr>
|
858 |
+
<tr>
|
859 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-30b-8k-chat" target="_blank">jais-family-30b-8k-chat</a></td>
|
860 |
+
<td>51.88</td>
|
861 |
+
<td>35.61</td>
|
862 |
+
<td>65.67</td>
|
863 |
+
<td>24.64</td>
|
864 |
+
</tr>
|
865 |
+
<tr>
|
866 |
+
<td><a href="https://huggingface.co/google/gemma-2-27b-it" target="_blank">gemma-2-27b-it</a></td>
|
867 |
+
<td>36.47</td>
|
868 |
+
<td>37.04</td>
|
869 |
+
<td>35.78</td>
|
870 |
+
<td>95.07</td>
|
871 |
+
</tr>
|
872 |
+
<tr>
|
873 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B" target="_blank">Atlas-Chat-27B</a></strong></td>
|
874 |
+
<td><b>61.95</b></td>
|
875 |
+
<td><b>48.37</b></td>
|
876 |
+
<td><b>75.67</b></td>
|
877 |
+
<td><b>96.58</b></td>
|
878 |
+
</tr>
|
879 |
+
</table>
|
880 |
+
|
881 |
+
**Standard NLP Tasks:**
|
882 |
+
<table>
|
883 |
+
<tr>
|
884 |
+
<td rowspan="2">Model</td>
|
885 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Translation)</a></td>
|
886 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MADAR (Translation)</a></td>
|
887 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">FLORES+ (Translation)</a></td>
|
888 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">NLLB-Seed (Translation)</a></td>
|
889 |
+
<td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Transliteration)</a></td>
|
890 |
+
<td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MArSum (Summarization)</a><br/>(LLM as a judge)</td>
|
891 |
+
<td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">Sentiment Analysis</a></td>
|
892 |
+
</tr>
|
893 |
+
<tr>
|
894 |
+
<td>BLEU</td>
|
895 |
+
<td>chrF</td>
|
896 |
+
<td>BLEU</td>
|
897 |
+
<td>chrF</td>
|
898 |
+
<td>BLEU</td>
|
899 |
+
<td>chrF</td>
|
900 |
+
<td>BLEU</td>
|
901 |
+
<td>chrF</td>
|
902 |
+
<td>BLEU</td>
|
903 |
+
<td>chrF</td>
|
904 |
+
</tr>
|
905 |
+
<tr>
|
906 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
|
907 |
+
<td>00.13</td>
|
908 |
+
<td>06.18</td>
|
909 |
+
<td>00.50</td>
|
910 |
+
<td>15.43</td>
|
911 |
+
<td>02.44</td>
|
912 |
+
<td>19.14</td>
|
913 |
+
<td>01.99</td>
|
914 |
+
<td>12.60</td>
|
915 |
+
<td>00.01</td>
|
916 |
+
<td>03.01</td>
|
917 |
+
<td>00.50</td>
|
918 |
+
<td>45.29</td>
|
919 |
+
</tr>
|
920 |
+
<tr>
|
921 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
|
922 |
+
<td>00.25</td>
|
923 |
+
<td>07.46</td>
|
924 |
+
<td>00.62</td>
|
925 |
+
<td>16.36</td>
|
926 |
+
<td>04.25</td>
|
927 |
+
<td>18.22</td>
|
928 |
+
<td>03.10</td>
|
929 |
+
<td>08.19</td>
|
930 |
+
<td>00.01</td>
|
931 |
+
<td>03.27</td>
|
932 |
+
<td>00.90</td>
|
933 |
+
<td>51.56</td>
|
934 |
+
</tr>
|
935 |
+
<tr>
|
936 |
+
<td><a href="https://huggingface.co/google/gemma-2-2b-it" target="_blank">gemma-2-2b-it</a></td>
|
937 |
+
<td>00.10</td>
|
938 |
+
<td>04.96</td>
|
939 |
+
<td>00.12</td>
|
940 |
+
<td>06.66</td>
|
941 |
+
<td>01.55</td>
|
942 |
+
<td>18.59</td>
|
943 |
+
<td>02.78</td>
|
944 |
+
<td>23.69</td>
|
945 |
+
<td>00.01</td>
|
946 |
+
<td>02.08</td>
|
947 |
+
<td>06.80</td>
|
948 |
+
<td>53.36</td>
|
949 |
+
</tr>
|
950 |
+
<tr>
|
951 |
+
<td><a href="meta-llama/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a></td>
|
952 |
+
<td>00.07</td>
|
953 |
+
<td>05.95</td>
|
954 |
+
<td>00.80</td>
|
955 |
+
<td>18.71</td>
|
956 |
+
<td>04.53</td>
|
957 |
+
<td>18.39</td>
|
958 |
+
<td>04.52</td>
|
959 |
+
<td>17.06</td>
|
960 |
+
<td>00.02</td>
|
961 |
+
<td>03.74</td>
|
962 |
+
<td>08.23</td>
|
963 |
+
<td>46.27</td>
|
964 |
+
</tr>
|
965 |
+
<tr>
|
966 |
+
<td><a href="meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama-3.2-3B-Instruct</a></td>
|
967 |
+
<td>00.62</td>
|
968 |
+
<td>13.67</td>
|
969 |
+
<td>01.18</td>
|
970 |
+
<td>22.12</td>
|
971 |
+
<td>08.59</td>
|
972 |
+
<td>35.21</td>
|
973 |
+
<td>13.75</td>
|
974 |
+
<td>43.63</td>
|
975 |
+
<td>00.21</td>
|
976 |
+
<td>09.68</td>
|
977 |
+
<td>08.23</td>
|
978 |
+
<td>49.20</td>
|
979 |
+
</tr>
|
980 |
+
<tr>
|
981 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
|
982 |
+
<td><b>22.76</td>
|
983 |
+
<td><b>44.86</td>
|
984 |
+
<td><b>16.67</td>
|
985 |
+
<td><b>41.64</td>
|
986 |
+
<td><b>14.92</td>
|
987 |
+
<td><b>43.03</td>
|
988 |
+
<td><b>23.88</td>
|
989 |
+
<td><b>52.19</td>
|
990 |
+
<td><b>08.18</td>
|
991 |
+
<td><b>21.54</td>
|
992 |
+
<td><b>55.22</td>
|
993 |
+
<td><b>73.99</td>
|
994 |
+
</tr>
|
995 |
+
<tr style="border-top: 4px solid;"></tr>
|
996 |
+
<tr>
|
997 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-6p7b-chat" target="_blank">jais-family-6p7b-chat</a></td>
|
998 |
+
<td>00.73</td>
|
999 |
+
<td>11.85</td>
|
1000 |
+
<td>01.88</td>
|
1001 |
+
<td>23.22</td>
|
1002 |
+
<td>04.25</td>
|
1003 |
+
<td>18.22</td>
|
1004 |
+
<td>04.62</td>
|
1005 |
+
<td>20.22</td>
|
1006 |
+
<td>00.02</td>
|
1007 |
+
<td>03.79</td>
|
1008 |
+
<td>03.02</td>
|
1009 |
+
<td>56.78</td>
|
1010 |
+
</tr>
|
1011 |
+
<tr>
|
1012 |
+
<td><a href="https://huggingface.co/inceptionai/jais-adapted-7b-chat" target="_blank">jais-adapted-7b-chat</a></td>
|
1013 |
+
<td>00.60</td>
|
1014 |
+
<td>09.43</td>
|
1015 |
+
<td>03.45</td>
|
1016 |
+
<td>25.88</td>
|
1017 |
+
<td>07.25</td>
|
1018 |
+
<td>23.21</td>
|
1019 |
+
<td>01.25</td>
|
1020 |
+
<td>02.22</td>
|
1021 |
+
<td>00.04</td>
|
1022 |
+
<td>03.24</td>
|
1023 |
+
<td>02.82</td>
|
1024 |
+
<td>52.72</td>
|
1025 |
+
</tr>
|
1026 |
+
<tr>
|
1027 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-13b-chat" target="_blank">jais-family-13b-chat</a></td>
|
1028 |
+
<td>00.92</td>
|
1029 |
+
<td>11.71</td>
|
1030 |
+
<td>04.01</td>
|
1031 |
+
<td>28.48</td>
|
1032 |
+
<td>05.70</td>
|
1033 |
+
<td>27.24</td>
|
1034 |
+
<td>04.50</td>
|
1035 |
+
<td>22.56</td>
|
1036 |
+
<td>00.03</td>
|
1037 |
+
<td>03.57</td>
|
1038 |
+
<td>01.77</td>
|
1039 |
+
<td>41.73</td>
|
1040 |
+
</tr>
|
1041 |
+
<tr>
|
1042 |
+
<td><a href="https://huggingface.co/inceptionai/jais-adapted-13b-chat" target="_blank">jais-adapted-13b-chat</a></td>
|
1043 |
+
<td>00.87</td>
|
1044 |
+
<td>10.52</td>
|
1045 |
+
<td>04.02</td>
|
1046 |
+
<td>25.29</td>
|
1047 |
+
<td>06.66</td>
|
1048 |
+
<td>23.46</td>
|
1049 |
+
<td>20.14</td>
|
1050 |
+
<td>47.87</td>
|
1051 |
+
<td>0.04</td>
|
1052 |
+
<td>04.77</td>
|
1053 |
+
<td>01.92</td>
|
1054 |
+
<td>66.68</td>
|
1055 |
+
</tr>
|
1056 |
+
<tr>
|
1057 |
+
<td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-7B-chat" target="_blank">AceGPT-7b-chat</a></td>
|
1058 |
+
<td>00.44</td>
|
1059 |
+
<td>11.33</td>
|
1060 |
+
<td>01.05</td>
|
1061 |
+
<td>19.24</td>
|
1062 |
+
<td>06.92</td>
|
1063 |
+
<td>36.03</td>
|
1064 |
+
<td>11.05</td>
|
1065 |
+
<td>44.55</td>
|
1066 |
+
<td>00.06</td>
|
1067 |
+
<td>04.74</td>
|
1068 |
+
<td>02.28</td>
|
1069 |
+
<td>40.23</td>
|
1070 |
+
</tr>
|
1071 |
+
<tr>
|
1072 |
+
<td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat" target="_blank">AceGPT-13b-chat</a></td>
|
1073 |
+
<td>00.98</td>
|
1074 |
+
<td>16.70</td>
|
1075 |
+
<td>00.81</td>
|
1076 |
+
<td>20.23</td>
|
1077 |
+
<td>08.73</td>
|
1078 |
+
<td>40.76</td>
|
1079 |
+
<td>14.02</td>
|
1080 |
+
<td>48.28</td>
|
1081 |
+
<td>00.12</td>
|
1082 |
+
<td>06.32</td>
|
1083 |
+
<td>02.80</td>
|
1084 |
+
<td>59.58</td>
|
1085 |
+
</tr>
|
1086 |
+
<tr>
|
1087 |
+
<td><a href="https://huggingface.co/google/gemma-2-9b-it" target="_blank">gemma-2-9b-it</a></td>
|
1088 |
+
<td>03.10</td>
|
1089 |
+
<td>19.16</td>
|
1090 |
+
<td>01.72</td>
|
1091 |
+
<td>24.35</td>
|
1092 |
+
<td>05.18</td>
|
1093 |
+
<td>36.96</td>
|
1094 |
+
<td>08.23</td>
|
1095 |
+
<td>43.57</td>
|
1096 |
+
<td>00.17</td>
|
1097 |
+
<td>09.14</td>
|
1098 |
+
<td>13.81</td>
|
1099 |
+
<td>59.87</td>
|
1100 |
+
</tr>
|
1101 |
+
<tr>
|
1102 |
+
<td><a href="meta-llama/Meta-Llama-3.1-8B-Instruct" target="_blank">Llama-3.1-8B-Instruct</a></td>
|
1103 |
+
<td>00.92</td>
|
1104 |
+
<td>14.19</td>
|
1105 |
+
<td>01.46</td>
|
1106 |
+
<td>23.82</td>
|
1107 |
+
<td>08.89</td>
|
1108 |
+
<td>33.08</td>
|
1109 |
+
<td>11.85</td>
|
1110 |
+
<td>35.51</td>
|
1111 |
+
<td>00.11</td>
|
1112 |
+
<td>06.02</td>
|
1113 |
+
<td>16.14</td>
|
1114 |
+
<td>44.08</td>
|
1115 |
+
</tr>
|
1116 |
+
<tr>
|
1117 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B" target="_blank">Atlas-Chat-9B</a></strong></td>
|
1118 |
+
<td><b>28.08</td>
|
1119 |
+
<td><b>50.48</td>
|
1120 |
+
<td><b>18.16</td>
|
1121 |
+
<td><b>43.91</td>
|
1122 |
+
<td><b>18.63</td>
|
1123 |
+
<td><b>47.53</td>
|
1124 |
+
<td><b>29.98</td>
|
1125 |
+
<td><b>58.26</td>
|
1126 |
+
<td><b>22.08</td>
|
1127 |
+
<td><b>34.17</td>
|
1128 |
+
<td><b>59.76</td>
|
1129 |
+
<td><b>81.89</td>
|
1130 |
+
</tr>
|
1131 |
+
<tr style="border-top: 4px solid;"></tr>
|
1132 |
+
<tr>
|
1133 |
+
<td><a href="https://huggingface.co/inceptionai/jais-family-30b-8k-chat" target="_blank">jais-family-30b-8k-chat</a></td>
|
1134 |
+
<td>01.10</td>
|
1135 |
+
<td>14.40</td>
|
1136 |
+
<td>01.67</td>
|
1137 |
+
<td>23.37</td>
|
1138 |
+
<td>08.52</td>
|
1139 |
+
<td>35.41</td>
|
1140 |
+
<td>13.71</td>
|
1141 |
+
<td>41.33</td>
|
1142 |
+
<td>00.05</td>
|
1143 |
+
<td>04.48</td>
|
1144 |
+
<td>00.46</td>
|
1145 |
+
<td>56.73</td>
|
1146 |
+
</tr>
|
1147 |
+
<tr>
|
1148 |
+
<td><a href="https://huggingface.co/google/gemma-2-27b-it" target="_blank">gemma-2-27b-it</a></td>
|
1149 |
+
<td>00.67</td>
|
1150 |
+
<td>13.04</td>
|
1151 |
+
<td>01.74</td>
|
1152 |
+
<td>24.63</td>
|
1153 |
+
<td>05.17</td>
|
1154 |
+
<td>37.08</td>
|
1155 |
+
<td>07.36</td>
|
1156 |
+
<td>42.49</td>
|
1157 |
+
<td>00.03</td>
|
1158 |
+
<td>04.94</td>
|
1159 |
+
<td>11.10</td>
|
1160 |
+
<td>57.59</td>
|
1161 |
+
</tr>
|
1162 |
+
<tr>
|
1163 |
+
<td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B" target="_blank">Atlas-Chat-27B</a></strong></td>
|
1164 |
+
<td><b>29.55</td>
|
1165 |
+
<td><b>51.74</td>
|
1166 |
+
<td><b>19.66</td>
|
1167 |
+
<td><b>45.65</td>
|
1168 |
+
<td><b>20.34</td>
|
1169 |
+
<td><b>49.19</td>
|
1170 |
+
<td><b>31.61</td>
|
1171 |
+
<td><b>59.37</td>
|
1172 |
+
<td><b>33.03</td>
|
1173 |
+
<td><b>40.95</td>
|
1174 |
+
<td><b>60.70</td>
|
1175 |
+
<td>73.00</td>
|
1176 |
+
</tr>
|
1177 |
+
|
1178 |
+
|
1179 |
+
|
1180 |
+
</table>
|
1181 |
|
1182 |
## Usage and Limitations
|
1183 |
|