amr-mohamed commited on
Commit
ee88e22
·
verified ·
1 Parent(s): 1cea0ca

Updated README

Browse files
Files changed (1) hide show
  1. README.md +668 -4
README.md CHANGED
@@ -14,14 +14,15 @@ base_model:
14
  ---
15
 
16
 
17
- # Atlas-Chat Model Card
18
 
19
 
20
  ## Model Overview
21
 
22
- Atlas-Chat is a family of open models instruction-tuned for Darija, the colloquial Arabic of Morocco, developed as part of the [Jais](https://arxiv.org/abs/2308.16149) project for standard Arabic and its extentions to dialectal Arabic. These models are designed for language generation and excel in various applications such as question answering, summarization, and translation. Thanks to their compact size, Atlas-Chat models can be deployed in resource-constrained environments like laptops, desktops, or personal cloud setups, making advanced AI accessible to Darija speakers and promoting widespread innovation. Two versions are available:
23
  * [Atlas-Chat-2B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B): A small-sized version with 2 billion parameters, capable of generating fluent Moroccan Darija text while maintaining efficiency.
24
- * [Atlas-Chat-9B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B): A larger version with 9 billion parameters, providing more nuanced, contextually rich language generation for complex tasks.
 
25
 
26
  The models are designed to assist with:
27
 
@@ -352,6 +353,7 @@ Our training dataset [Darija-SFT-Mixture](https://huggingface.co/datasets/MBZUAI
352
  Atlas-Chat models are based on Gemma 2 models. The Atlas-Chat models were trained using 8 Nvidia's A100 80 GB GPUs in parallel using FSDP on AWS Sagemaker. The model is trained using HuggingFace transformers and parameter-efficient fine-tuning with LoRA rank of 256.
353
 
354
 
 
355
  ## Evaluation
356
  The Atlas-Chat models were evaluated on a comprehensive suite of tasks using various datasets and benchmarks to assess their performance across multiple dimensions. These included tasks such as:
357
 
@@ -360,6 +362,7 @@ The Atlas-Chat models were evaluated on a comprehensive suite of tasks using var
360
  * **Belebele Ary_Arab:** Belebele is a multiple-choice machine reading comprehension dataset published by Facebook spanning 122 language variants. The Evaluation is done on the Ary_Arab part of Belebele that refers to Darija.
361
  * **Sentiment Analysis.**
362
  * **Translation:** Including six directions and four languages: Darija, MSA, English and French.
 
363
  * **Summarization.**
364
 
365
  The models were compared against a collection of existing open-source Arabic models to gauge their effectiveness, with a particular focus on performance in Darija. All scores are based on zero-shot performance. The prompts are written mainly in Darija. The metric used for DarijaMMLU, DarijaHellaSwag, Belebele Ary and Sentiment Analysis is the normalized accuracy. We used [Language Model Evaluation Harness](https://github.com/MBZUAI-Paris/lm-evaluation-harness-atlas-chat) to conduct these evaluations.
@@ -371,12 +374,24 @@ The models were compared against a collection of existing open-source Arabic mod
371
  <td rowspan="2"><a href="MBZUAI-Paris/DarijaHellaSwag" target="_blank">DarijaHellaSwag</a></td>
372
  <td rowspan="2"><a href="https://huggingface.co/datasets/facebook/belebele/viewer/ary_Arab" target="_blank">Belebele Ary</a></td>
373
  <td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">Sentiment Analysis</a></td>
374
- <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DoDa-10k (Translation)</a></td>
 
 
 
 
375
  <td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MArSum (Summarization)</a><br/>(LLM as a judge)</td>
376
  </tr>
377
  <tr>
378
  <td>BLEU</td>
379
  <td>chrF</td>
 
 
 
 
 
 
 
 
380
  </tr>
381
  <tr>
382
  <td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
@@ -387,6 +402,14 @@ The models were compared against a collection of existing open-source Arabic mod
387
  <td>00.13</td>
388
  <td>06.18</td>
389
  <td>00.50</td>
 
 
 
 
 
 
 
 
390
  </tr>
391
  <tr>
392
  <td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
@@ -396,6 +419,14 @@ The models were compared against a collection of existing open-source Arabic mod
396
  <td>51.56</td>
397
  <td>00.25</td>
398
  <td>07.46</td>
 
 
 
 
 
 
 
 
399
  <td>00.90</td>
400
  </tr>
401
  <tr>
@@ -406,8 +437,52 @@ The models were compared against a collection of existing open-source Arabic mod
406
  <td>53.36</td>
407
  <td>00.10</td>
408
  <td>04.96</td>
 
 
 
 
 
 
 
 
409
  <td>06.80</td>
410
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
411
  <tr>
412
  <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
413
  <td><b>44.97</td>
@@ -416,6 +491,14 @@ The models were compared against a collection of existing open-source Arabic mod
416
  <td><b>73.99</td>
417
  <td><b>22.76</td>
418
  <td><b>44.86</td>
 
 
 
 
 
 
 
 
419
  <td><b>55.22</td>
420
  </tr>
421
  <tr style="border-top: 4px solid;"></tr>
@@ -427,6 +510,14 @@ The models were compared against a collection of existing open-source Arabic mod
427
  <td>56.78</td>
428
  <td>00.73</td>
429
  <td>11.85</td>
 
 
 
 
 
 
 
 
430
  <td>03.02</td>
431
  </tr>
432
  <tr>
@@ -437,6 +528,14 @@ The models were compared against a collection of existing open-source Arabic mod
437
  <td>52.72</td>
438
  <td>00.60</td>
439
  <td>09.43</td>
 
 
 
 
 
 
 
 
440
  <td>02.82</td>
441
  </tr>
442
  <tr>
@@ -447,6 +546,14 @@ The models were compared against a collection of existing open-source Arabic mod
447
  <td>41.73</td>
448
  <td>00.92</td>
449
  <td>11.71</td>
 
 
 
 
 
 
 
 
450
  <td>01.77</td>
451
  </tr>
452
  <tr>
@@ -457,6 +564,14 @@ The models were compared against a collection of existing open-source Arabic mod
457
  <td>66.68</td>
458
  <td>00.87</td>
459
  <td>10.52</td>
 
 
 
 
 
 
 
 
460
  <td>01.92</td>
461
  </tr>
462
  <tr>
@@ -467,6 +582,14 @@ The models were compared against a collection of existing open-source Arabic mod
467
  <td>40.23</td>
468
  <td>00.44</td>
469
  <td>11.33</td>
 
 
 
 
 
 
 
 
470
  <td>02.28</td>
471
  </tr>
472
  <tr>
@@ -477,6 +600,14 @@ The models were compared against a collection of existing open-source Arabic mod
477
  <td>59.58</td>
478
  <td>00.98</td>
479
  <td>16.70</td>
 
 
 
 
 
 
 
 
480
  <td>02.80</td>
481
  </tr>
482
  <tr>
@@ -487,6 +618,14 @@ The models were compared against a collection of existing open-source Arabic mod
487
  <td>59.87</td>
488
  <td>03.10</td>
489
  <td>19.16</td>
 
 
 
 
 
 
 
 
490
  <td>13.81</td>
491
  </tr>
492
  <tr>
@@ -497,6 +636,14 @@ The models were compared against a collection of existing open-source Arabic mod
497
  <td>44.08</td>
498
  <td>00.92</td>
499
  <td>14.19</td>
 
 
 
 
 
 
 
 
500
  <td>01.28</td>
501
  </tr>
502
  <tr>
@@ -507,13 +654,530 @@ The models were compared against a collection of existing open-source Arabic mod
507
  <td><b>81.89</td>
508
  <td><b>28.08</td>
509
  <td><b>50.48</td>
 
 
 
 
 
 
 
 
510
  <td><b>59.76</td>
511
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
512
 
513
 
514
 
515
  </table>
 
 
 
 
 
 
 
 
 
 
 
 
 
516
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
517
 
518
  ## Usage and Limitations
519
 
 
14
  ---
15
 
16
 
17
+ # JAIS Intiative: Atlas-Chat Models
18
 
19
 
20
  ## Model Overview
21
 
22
+ Atlas-Chat is a family of open models instruction-tuned for Darija, the colloquial Arabic of Morocco, developed as part of the [Jais](https://arxiv.org/abs/2308.16149) project for standard Arabic and its extentions to dialectal Arabic. These models are designed for language generation and excel in various applications such as question answering, summarization, and translation. Thanks to their compact size, Atlas-Chat models can be deployed in resource-constrained environments like laptops, desktops, or personal cloud setups, making advanced AI accessible to Darija speakers and promoting widespread innovation. Three sizes are available:
23
  * [Atlas-Chat-2B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B): A small-sized version with 2 billion parameters, capable of generating fluent Moroccan Darija text while maintaining efficiency.
24
+ * [Atlas-Chat-9B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B): A medium-sized with 9 billion parameters, providing more nuanced, contextually rich language generation for complex tasks.
25
+ * [Atlas-Chat-27B](https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B): A large-sized version with 27 billion parameters, offering even more advanced capabilities for complex tasks and nuanced language generation compared to the 2B and 9B versions.
26
 
27
  The models are designed to assist with:
28
 
 
353
  Atlas-Chat models are based on Gemma 2 models. The Atlas-Chat models were trained using 8 Nvidia's A100 80 GB GPUs in parallel using FSDP on AWS Sagemaker. The model is trained using HuggingFace transformers and parameter-efficient fine-tuning with LoRA rank of 256.
354
 
355
 
356
+ <!--
357
  ## Evaluation
358
  The Atlas-Chat models were evaluated on a comprehensive suite of tasks using various datasets and benchmarks to assess their performance across multiple dimensions. These included tasks such as:
359
 
 
362
  * **Belebele Ary_Arab:** Belebele is a multiple-choice machine reading comprehension dataset published by Facebook spanning 122 language variants. The Evaluation is done on the Ary_Arab part of Belebele that refers to Darija.
363
  * **Sentiment Analysis.**
364
  * **Translation:** Including six directions and four languages: Darija, MSA, English and French.
365
+ * **Transliteration:** Transforming a sentence from Darija (written in Arabic characters) to Arabizi (Written in Latin characters) and vice-versa.
366
  * **Summarization.**
367
 
368
  The models were compared against a collection of existing open-source Arabic models to gauge their effectiveness, with a particular focus on performance in Darija. All scores are based on zero-shot performance. The prompts are written mainly in Darija. The metric used for DarijaMMLU, DarijaHellaSwag, Belebele Ary and Sentiment Analysis is the normalized accuracy. We used [Language Model Evaluation Harness](https://github.com/MBZUAI-Paris/lm-evaluation-harness-atlas-chat) to conduct these evaluations.
 
374
  <td rowspan="2"><a href="MBZUAI-Paris/DarijaHellaSwag" target="_blank">DarijaHellaSwag</a></td>
375
  <td rowspan="2"><a href="https://huggingface.co/datasets/facebook/belebele/viewer/ary_Arab" target="_blank">Belebele Ary</a></td>
376
  <td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">Sentiment Analysis</a></td>
377
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Translation)</a></td>
378
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MADAR (Translation)</a></td>
379
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">FLORES+ (Translation)</a></td>
380
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">NLLB-Seed (Translation)</a></td>
381
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Transliteration)</a></td>
382
  <td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MArSum (Summarization)</a><br/>(LLM as a judge)</td>
383
  </tr>
384
  <tr>
385
  <td>BLEU</td>
386
  <td>chrF</td>
387
+ <td>BLEU</td>
388
+ <td>chrF</td>
389
+ <td>BLEU</td>
390
+ <td>chrF</td>
391
+ <td>BLEU</td>
392
+ <td>chrF</td>
393
+ <td>BLEU</td>
394
+ <td>chrF</td>
395
  </tr>
396
  <tr>
397
  <td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
 
402
  <td>00.13</td>
403
  <td>06.18</td>
404
  <td>00.50</td>
405
+ <td>15.43</td>
406
+ <td>02.44</td>
407
+ <td>19.14</td>
408
+ <td>01.99</td>
409
+ <td>12.60</td>
410
+ <td>00.01</td>
411
+ <td>03.01</td>
412
+ <td>00.50</td>
413
  </tr>
414
  <tr>
415
  <td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
 
419
  <td>51.56</td>
420
  <td>00.25</td>
421
  <td>07.46</td>
422
+ <td>00.62</td>
423
+ <td>16.36</td>
424
+ <td>04.25</td>
425
+ <td>18.22</td>
426
+ <td>03.10</td>
427
+ <td>08.19</td>
428
+ <td>00.01</td>
429
+ <td>03.27</td>
430
  <td>00.90</td>
431
  </tr>
432
  <tr>
 
437
  <td>53.36</td>
438
  <td>00.10</td>
439
  <td>04.96</td>
440
+ <td>00.12</td>
441
+ <td>06.66</td>
442
+ <td>01.55</td>
443
+ <td>18.59</td>
444
+ <td>02.78</td>
445
+ <td>23.69</td>
446
+ <td>00.01</td>
447
+ <td>02.08</td>
448
  <td>06.80</td>
449
  </tr>
450
+ <tr>
451
+ <td><a href="meta-llama/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a></td>
452
+ <td>27.66</td>
453
+ <td>26.88</td>
454
+ <td>28.89</td>
455
+ <td>46.27</td>
456
+ <td>00.07</td>
457
+ <td>05.95</td>
458
+ <td>00.80</td>
459
+ <td>18.71</td>
460
+ <td>04.53</td>
461
+ <td>18.39</td>
462
+ <td>04.52</td>
463
+ <td>17.06</td>
464
+ <td>00.02</td>
465
+ <td>03.74</td>
466
+ <td>08.23</td>
467
+ </tr>
468
+ <tr>
469
+ <td><a href="meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama-3.2-3B-Instruct</a></td>
470
+ <td>32.60</td>
471
+ <td>28.33</td>
472
+ <td>38.00</td>
473
+ <td>49.20</td>
474
+ <td>00.62</td>
475
+ <td>13.67</td>
476
+ <td>01.18</td>
477
+ <td>22.12</td>
478
+ <td>08.59</td>
479
+ <td>35.21</td>
480
+ <td>13.75</td>
481
+ <td>43.63</td>
482
+ <td>00.21</td>
483
+ <td>09.68</td>
484
+ <td>08.23</td>
485
+ </tr>
486
  <tr>
487
  <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
488
  <td><b>44.97</td>
 
491
  <td><b>73.99</td>
492
  <td><b>22.76</td>
493
  <td><b>44.86</td>
494
+ <td><b>16.67</td>
495
+ <td><b>41.64</td>
496
+ <td><b>14.92</td>
497
+ <td><b>43.03</td>
498
+ <td><b>23.88</td>
499
+ <td><b>52.19</td>
500
+ <td><b>08.18</td>
501
+ <td><b>21.54</td>
502
  <td><b>55.22</td>
503
  </tr>
504
  <tr style="border-top: 4px solid;"></tr>
 
510
  <td>56.78</td>
511
  <td>00.73</td>
512
  <td>11.85</td>
513
+ <td>01.88</td>
514
+ <td>23.22</td>
515
+ <td>04.25</td>
516
+ <td>18.22</td>
517
+ <td>04.62</td>
518
+ <td>20.22</td>
519
+ <td>00.02</td>
520
+ <td>03.79</td>
521
  <td>03.02</td>
522
  </tr>
523
  <tr>
 
528
  <td>52.72</td>
529
  <td>00.60</td>
530
  <td>09.43</td>
531
+ <td>03.45</td>
532
+ <td>25.88</td>
533
+ <td>07.25</td>
534
+ <td>23.21</td>
535
+ <td>01.25</td>
536
+ <td>02.22</td>
537
+ <td>00.04</td>
538
+ <td>03.24</td>
539
  <td>02.82</td>
540
  </tr>
541
  <tr>
 
546
  <td>41.73</td>
547
  <td>00.92</td>
548
  <td>11.71</td>
549
+ <td>04.01</td>
550
+ <td>28.48</td>
551
+ <td>05.70</td>
552
+ <td>27.24</td>
553
+ <td>04.50</td>
554
+ <td>22.56</td>
555
+ <td>00.03</td>
556
+ <td>03.57</td>
557
  <td>01.77</td>
558
  </tr>
559
  <tr>
 
564
  <td>66.68</td>
565
  <td>00.87</td>
566
  <td>10.52</td>
567
+ <td>04.02</td>
568
+ <td>25.29</td>
569
+ <td>06.66</td>
570
+ <td>23.46</td>
571
+ <td>20.14</td>
572
+ <td>47.87</td>
573
+ <td>0.04</td>
574
+ <td>04.77</td>
575
  <td>01.92</td>
576
  </tr>
577
  <tr>
 
582
  <td>40.23</td>
583
  <td>00.44</td>
584
  <td>11.33</td>
585
+ <td>01.05</td>
586
+ <td>19.24</td>
587
+ <td>06.92</td>
588
+ <td>36.03</td>
589
+ <td>11.05</td>
590
+ <td>44.55</td>
591
+ <td>00.06</td>
592
+ <td>04.74</td>
593
  <td>02.28</td>
594
  </tr>
595
  <tr>
 
600
  <td>59.58</td>
601
  <td>00.98</td>
602
  <td>16.70</td>
603
+ <td>00.81</td>
604
+ <td>20.23</td>
605
+ <td>08.73</td>
606
+ <td>40.76</td>
607
+ <td>14.02</td>
608
+ <td>48.28</td>
609
+ <td>00.12</td>
610
+ <td>06.32</td>
611
  <td>02.80</td>
612
  </tr>
613
  <tr>
 
618
  <td>59.87</td>
619
  <td>03.10</td>
620
  <td>19.16</td>
621
+ <td>01.72</td>
622
+ <td>24.35</td>
623
+ <td>05.18</td>
624
+ <td>36.96</td>
625
+ <td>08.23</td>
626
+ <td>43.57</td>
627
+ <td>00.17</td>
628
+ <td>09.14</td>
629
  <td>13.81</td>
630
  </tr>
631
  <tr>
 
636
  <td>44.08</td>
637
  <td>00.92</td>
638
  <td>14.19</td>
639
+ <td>01.46</td>
640
+ <td>23.82</td>
641
+ <td>08.89</td>
642
+ <td>33.08</td>
643
+ <td>11.85</td>
644
+ <td>35.51</td>
645
+ <td>00.11</td>
646
+ <td>06.02</td>
647
  <td>01.28</td>
648
  </tr>
649
  <tr>
 
654
  <td><b>81.89</td>
655
  <td><b>28.08</td>
656
  <td><b>50.48</td>
657
+ <td><b>18.16</td>
658
+ <td><b>43.91</td>
659
+ <td><b>18.63</td>
660
+ <td><b>47.53</td>
661
+ <td><b>29.98</td>
662
+ <td><b>58.26</td>
663
+ <td><b>22.08</td>
664
+ <td><b>34.17</td>
665
  <td><b>59.76</td>
666
  </tr>
667
+ <tr style="border-top: 4px solid;"></tr>
668
+ <tr>
669
+ <td><a href="https://huggingface.co/inceptionai/jais-family-30b-8k-chat" target="_blank">jais-family-30b-8k-chat</a></td>
670
+ <td>51.88</td>
671
+ <td>35.61</td>
672
+ <td>65.67</td>
673
+ <td>56.73</td>
674
+ <td>01.10</td>
675
+ <td>14.40</td>
676
+ <td>01.67</td>
677
+ <td>23.37</td>
678
+ <td>08.52</td>
679
+ <td>35.41</td>
680
+ <td>13.71</td>
681
+ <td>41.33</td>
682
+ <td>00.05</td>
683
+ <td>04.48</td>
684
+ <td>00.46</td>
685
+ </tr>
686
+ <tr>
687
+ <td><a href="https://huggingface.co/google/gemma-2-27b-it" target="_blank">gemma-2-27b-it</a></td>
688
+ <td>36.47</td>
689
+ <td>37.04</td>
690
+ <td>35.78</td>
691
+ <td>57.59</td>
692
+ <td>00.67</td>
693
+ <td>13.04</td>
694
+ <td>01.74</td>
695
+ <td>24.63</td>
696
+ <td>05.17</td>
697
+ <td>37.08</td>
698
+ <td>07.36</td>
699
+ <td>42.49</td>
700
+ <td>00.03</td>
701
+ <td>04.94</td>
702
+ <td>11.10</td>
703
+ </tr>
704
+ <tr>
705
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B" target="_blank">Atlas-Chat-27B</a></strong></td>
706
+ <td><b>61.95</td>
707
+ <td><b>48.37</td>
708
+ <td><b>75.67</td>
709
+ <td>73.00</td>
710
+ <td><b>29.55</td>
711
+ <td><b>51.74</td>
712
+ <td><b>19.66</td>
713
+ <td><b>45.65</td>
714
+ <td><b>20.34</td>
715
+ <td><b>49.19</td>
716
+ <td><b>31.61</td>
717
+ <td><b>59.37</td>
718
+ <td><b>33.03</td>
719
+ <td><b>40.95</td>
720
+ <td><b>60.70</td>
721
+ </tr>
722
 
723
 
724
 
725
  </table>
726
+ -->
727
+
728
+ ## Evaluation
729
+ The Atlas-Chat models were evaluated on a comprehensive suite of tasks using various datasets and benchmarks to assess their performance across multiple dimensions. These included tasks such as:
730
+
731
+ * **DarijaMMLU:** A Darija version of ArabicMMLU and MMLU benchmarks translated from MSA and English respectively.
732
+ * **DarijaHellaSwag:** A Darija version of HellaSwag.
733
+ * **Belebele Ary_Arab:** Belebele is a multiple-choice machine reading comprehension dataset published by Facebook spanning 122 language variants. The Evaluation is done on the Ary_Arab part of Belebele that refers to Darija.
734
+ * **DarijaAlpacaEval:** A Darija version of AlpacaEval translated to Darija and adapted to the Moroccan culture.
735
+ * **Sentiment Analysis.**
736
+ * **Translation:** Including six directions and four languages: Darija, MSA, English and French.
737
+ * **Transliteration:** Transforming a sentence from Darija (written in Arabic characters) to Arabizi (Written in Latin characters) and vice-versa.
738
+ * **Summarization.**
739
 
740
+ The models were compared against a collection of existing open-source Arabic models to gauge their effectiveness, with a particular focus on performance in Darija. All scores are based on zero-shot performance. The prompts are written mainly in Darija. The metric used for DarijaMMLU, DarijaHellaSwag, Belebele Ary and Sentiment Analysis is the normalized accuracy. We used [Language Model Evaluation Harness](https://github.com/MBZUAI-Paris/lm-evaluation-harness-atlas-chat) to conduct these evaluations.
741
+
742
+ **LLMs Benchmarks:**
743
+ <table>
744
+ <tr>
745
+ <td>Model</td>
746
+ <td><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaMMLU" target="_blank">DarijaMMLU</a></td>
747
+ <td><a href="MBZUAI-Paris/DarijaHellaSwag" target="_blank">DarijaHellaSwag</a></td>
748
+ <td ><a href="https://huggingface.co/datasets/facebook/belebele/viewer/ary_Arab" target="_blank">Belebele Ary</a></td>
749
+ <td ><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaAlpacaEval" target="_blank">DarijaAlpacaEval</a></td>
750
+ </tr>
751
+ <tr>
752
+ <td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
753
+ <td>35.39</td>
754
+ <td>32.51</td>
755
+ <td>38.33</td>
756
+ <td>35.56</td>
757
+ </tr>
758
+ <tr>
759
+ <td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
760
+ <td>37.44</td>
761
+ <td>34.49</td>
762
+ <td>44.11</td>
763
+ <td>52.97</td>
764
+ </tr>
765
+ <tr>
766
+ <td><a href="https://huggingface.co/google/gemma-2-2b-it" target="_blank">gemma-2-2b-it</a></td>
767
+ <td>28.58</td>
768
+ <td>32.42</td>
769
+ <td>25.22</td>
770
+ <td>58.67</td>
771
+ </tr>
772
+ <tr>
773
+ <td><a href="meta-llama/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a></td>
774
+ <td>27.66</td>
775
+ <td>26.88</td>
776
+ <td>28.89</td>
777
+ <td>23.57</td>
778
+ </tr>
779
+ <tr>
780
+ <td><a href="meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama-3.2-3B-Instruct</a></td>
781
+ <td>32.60</td>
782
+ <td>28.33</td>
783
+ <td>38.00</td>
784
+ <td>47.62</td>
785
+ </tr>
786
+ <tr>
787
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
788
+ <td><b>44.97</b></td>
789
+ <td><b>41.48</b></td>
790
+ <td><b>53.89</b></td>
791
+ <td><b>92.31</b></td>
792
+ </tr>
793
+ <tr style="border-top: 4px solid;"></tr>
794
+ <tr>
795
+ <td><a href="https://huggingface.co/inceptionai/jais-family-6p7b-chat" target="_blank">jais-family-6p7b-chat</a></td>
796
+ <td>39.96</td>
797
+ <td>41.57</td>
798
+ <td>51.22</td>
799
+ <td>65.18</td>
800
+ </tr>
801
+ <tr>
802
+ <td><a href="https://huggingface.co/inceptionai/jais-adapted-7b-chat" target="_blank">jais-adapted-7b-chat</a></td>
803
+ <td>39.30</td>
804
+ <td>35.19</td>
805
+ <td>43.67</td>
806
+ <td>61.84</td>
807
+ </tr>
808
+ <tr>
809
+ <td><a href="https://huggingface.co/inceptionai/jais-family-13b-chat" target="_blank">jais-family-13b-chat</a></td>
810
+ <td>45.11</td>
811
+ <td>43.90</td>
812
+ <td>58.67</td>
813
+ <td>69.93</td>
814
+ </tr>
815
+ <tr>
816
+ <td><a href="https://huggingface.co/inceptionai/jais-adapted-13b-chat" target="_blank">jais-adapted-13b-chat</a></td>
817
+ <td>45.20</td>
818
+ <td>40.65</td>
819
+ <td>49.67</td>
820
+ <td>77.52</td>
821
+ </tr>
822
+ <tr>
823
+ <td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-7B-chat" target="_blank">AceGPT-7b-chat</a></td>
824
+ <td>35.98</td>
825
+ <td>36.57</td>
826
+ <td>30.11</td>
827
+ <td>47.31</td>
828
+ </tr>
829
+ <tr>
830
+ <td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat" target="_blank">AceGPT-13b-chat</a></td>
831
+ <td>41.09</td>
832
+ <td>38.35</td>
833
+ <td>33.11</td>
834
+ <td>52.79</td>
835
+ </tr>
836
+ <tr>
837
+ <td><a href="https://huggingface.co/google/gemma-2-9b-it" target="_blank">gemma-2-9b-it</a></td>
838
+ <td>35.91</td>
839
+ <td>42.43</td>
840
+ <td>31.00</td>
841
+ <td>90.86</td>
842
+ </tr>
843
+ <tr>
844
+ <td><a href="meta-llama/Meta-Llama-3.1-8B-Instruct" target="_blank">Llama-3.1-8B-Instruct</a></td>
845
+ <td>44.13</td>
846
+ <td>38.24</td>
847
+ <td>47.00</td>
848
+ <td>78.08</td>
849
+ </tr>
850
+ <tr>
851
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B" target="_blank">Atlas-Chat-9B</a></strong></td>
852
+ <td><b>58.23</b></td>
853
+ <td><b>57.75</b></td>
854
+ <td><b>74.56</b></td>
855
+ <td><b>95.62</b></td>
856
+ </tr>
857
+ <tr style="border-top: 4px solid;"></tr>
858
+ <tr>
859
+ <td><a href="https://huggingface.co/inceptionai/jais-family-30b-8k-chat" target="_blank">jais-family-30b-8k-chat</a></td>
860
+ <td>51.88</td>
861
+ <td>35.61</td>
862
+ <td>65.67</td>
863
+ <td>24.64</td>
864
+ </tr>
865
+ <tr>
866
+ <td><a href="https://huggingface.co/google/gemma-2-27b-it" target="_blank">gemma-2-27b-it</a></td>
867
+ <td>36.47</td>
868
+ <td>37.04</td>
869
+ <td>35.78</td>
870
+ <td>95.07</td>
871
+ </tr>
872
+ <tr>
873
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B" target="_blank">Atlas-Chat-27B</a></strong></td>
874
+ <td><b>61.95</b></td>
875
+ <td><b>48.37</b></td>
876
+ <td><b>75.67</b></td>
877
+ <td><b>96.58</b></td>
878
+ </tr>
879
+ </table>
880
+
881
+ **Standard NLP Tasks:**
882
+ <table>
883
+ <tr>
884
+ <td rowspan="2">Model</td>
885
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Translation)</a></td>
886
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MADAR (Translation)</a></td>
887
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">FLORES+ (Translation)</a></td>
888
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">NLLB-Seed (Translation)</a></td>
889
+ <td colspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">DODa-10k (Transliteration)</a></td>
890
+ <td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">MArSum (Summarization)</a><br/>(LLM as a judge)</td>
891
+ <td rowspan="2"><a href="https://huggingface.co/datasets/MBZUAI-Paris/DarijaBench" target="_blank">Sentiment Analysis</a></td>
892
+ </tr>
893
+ <tr>
894
+ <td>BLEU</td>
895
+ <td>chrF</td>
896
+ <td>BLEU</td>
897
+ <td>chrF</td>
898
+ <td>BLEU</td>
899
+ <td>chrF</td>
900
+ <td>BLEU</td>
901
+ <td>chrF</td>
902
+ <td>BLEU</td>
903
+ <td>chrF</td>
904
+ </tr>
905
+ <tr>
906
+ <td><a href="https://huggingface.co/inceptionai/jais-family-1p3b-chat" target="_blank">jais-family-1p3b-chat</a></td>
907
+ <td>00.13</td>
908
+ <td>06.18</td>
909
+ <td>00.50</td>
910
+ <td>15.43</td>
911
+ <td>02.44</td>
912
+ <td>19.14</td>
913
+ <td>01.99</td>
914
+ <td>12.60</td>
915
+ <td>00.01</td>
916
+ <td>03.01</td>
917
+ <td>00.50</td>
918
+ <td>45.29</td>
919
+ </tr>
920
+ <tr>
921
+ <td><a href="https://huggingface.co/inceptionai/jais-family-2p7b-chat" target="_blank">jais-family-2p7b-chat</a></td>
922
+ <td>00.25</td>
923
+ <td>07.46</td>
924
+ <td>00.62</td>
925
+ <td>16.36</td>
926
+ <td>04.25</td>
927
+ <td>18.22</td>
928
+ <td>03.10</td>
929
+ <td>08.19</td>
930
+ <td>00.01</td>
931
+ <td>03.27</td>
932
+ <td>00.90</td>
933
+ <td>51.56</td>
934
+ </tr>
935
+ <tr>
936
+ <td><a href="https://huggingface.co/google/gemma-2-2b-it" target="_blank">gemma-2-2b-it</a></td>
937
+ <td>00.10</td>
938
+ <td>04.96</td>
939
+ <td>00.12</td>
940
+ <td>06.66</td>
941
+ <td>01.55</td>
942
+ <td>18.59</td>
943
+ <td>02.78</td>
944
+ <td>23.69</td>
945
+ <td>00.01</td>
946
+ <td>02.08</td>
947
+ <td>06.80</td>
948
+ <td>53.36</td>
949
+ </tr>
950
+ <tr>
951
+ <td><a href="meta-llama/Llama-3.2-1B-Instruct" target="_blank">Llama-3.2-1B-Instruct</a></td>
952
+ <td>00.07</td>
953
+ <td>05.95</td>
954
+ <td>00.80</td>
955
+ <td>18.71</td>
956
+ <td>04.53</td>
957
+ <td>18.39</td>
958
+ <td>04.52</td>
959
+ <td>17.06</td>
960
+ <td>00.02</td>
961
+ <td>03.74</td>
962
+ <td>08.23</td>
963
+ <td>46.27</td>
964
+ </tr>
965
+ <tr>
966
+ <td><a href="meta-llama/Llama-3.2-3B-Instruct" target="_blank">Llama-3.2-3B-Instruct</a></td>
967
+ <td>00.62</td>
968
+ <td>13.67</td>
969
+ <td>01.18</td>
970
+ <td>22.12</td>
971
+ <td>08.59</td>
972
+ <td>35.21</td>
973
+ <td>13.75</td>
974
+ <td>43.63</td>
975
+ <td>00.21</td>
976
+ <td>09.68</td>
977
+ <td>08.23</td>
978
+ <td>49.20</td>
979
+ </tr>
980
+ <tr>
981
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-2B" target="_blank">Atlas-Chat-2B</a></strong></td>
982
+ <td><b>22.76</td>
983
+ <td><b>44.86</td>
984
+ <td><b>16.67</td>
985
+ <td><b>41.64</td>
986
+ <td><b>14.92</td>
987
+ <td><b>43.03</td>
988
+ <td><b>23.88</td>
989
+ <td><b>52.19</td>
990
+ <td><b>08.18</td>
991
+ <td><b>21.54</td>
992
+ <td><b>55.22</td>
993
+ <td><b>73.99</td>
994
+ </tr>
995
+ <tr style="border-top: 4px solid;"></tr>
996
+ <tr>
997
+ <td><a href="https://huggingface.co/inceptionai/jais-family-6p7b-chat" target="_blank">jais-family-6p7b-chat</a></td>
998
+ <td>00.73</td>
999
+ <td>11.85</td>
1000
+ <td>01.88</td>
1001
+ <td>23.22</td>
1002
+ <td>04.25</td>
1003
+ <td>18.22</td>
1004
+ <td>04.62</td>
1005
+ <td>20.22</td>
1006
+ <td>00.02</td>
1007
+ <td>03.79</td>
1008
+ <td>03.02</td>
1009
+ <td>56.78</td>
1010
+ </tr>
1011
+ <tr>
1012
+ <td><a href="https://huggingface.co/inceptionai/jais-adapted-7b-chat" target="_blank">jais-adapted-7b-chat</a></td>
1013
+ <td>00.60</td>
1014
+ <td>09.43</td>
1015
+ <td>03.45</td>
1016
+ <td>25.88</td>
1017
+ <td>07.25</td>
1018
+ <td>23.21</td>
1019
+ <td>01.25</td>
1020
+ <td>02.22</td>
1021
+ <td>00.04</td>
1022
+ <td>03.24</td>
1023
+ <td>02.82</td>
1024
+ <td>52.72</td>
1025
+ </tr>
1026
+ <tr>
1027
+ <td><a href="https://huggingface.co/inceptionai/jais-family-13b-chat" target="_blank">jais-family-13b-chat</a></td>
1028
+ <td>00.92</td>
1029
+ <td>11.71</td>
1030
+ <td>04.01</td>
1031
+ <td>28.48</td>
1032
+ <td>05.70</td>
1033
+ <td>27.24</td>
1034
+ <td>04.50</td>
1035
+ <td>22.56</td>
1036
+ <td>00.03</td>
1037
+ <td>03.57</td>
1038
+ <td>01.77</td>
1039
+ <td>41.73</td>
1040
+ </tr>
1041
+ <tr>
1042
+ <td><a href="https://huggingface.co/inceptionai/jais-adapted-13b-chat" target="_blank">jais-adapted-13b-chat</a></td>
1043
+ <td>00.87</td>
1044
+ <td>10.52</td>
1045
+ <td>04.02</td>
1046
+ <td>25.29</td>
1047
+ <td>06.66</td>
1048
+ <td>23.46</td>
1049
+ <td>20.14</td>
1050
+ <td>47.87</td>
1051
+ <td>0.04</td>
1052
+ <td>04.77</td>
1053
+ <td>01.92</td>
1054
+ <td>66.68</td>
1055
+ </tr>
1056
+ <tr>
1057
+ <td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-7B-chat" target="_blank">AceGPT-7b-chat</a></td>
1058
+ <td>00.44</td>
1059
+ <td>11.33</td>
1060
+ <td>01.05</td>
1061
+ <td>19.24</td>
1062
+ <td>06.92</td>
1063
+ <td>36.03</td>
1064
+ <td>11.05</td>
1065
+ <td>44.55</td>
1066
+ <td>00.06</td>
1067
+ <td>04.74</td>
1068
+ <td>02.28</td>
1069
+ <td>40.23</td>
1070
+ </tr>
1071
+ <tr>
1072
+ <td><a href="https://huggingface.co/FreedomIntelligence/AceGPT-13B-chat" target="_blank">AceGPT-13b-chat</a></td>
1073
+ <td>00.98</td>
1074
+ <td>16.70</td>
1075
+ <td>00.81</td>
1076
+ <td>20.23</td>
1077
+ <td>08.73</td>
1078
+ <td>40.76</td>
1079
+ <td>14.02</td>
1080
+ <td>48.28</td>
1081
+ <td>00.12</td>
1082
+ <td>06.32</td>
1083
+ <td>02.80</td>
1084
+ <td>59.58</td>
1085
+ </tr>
1086
+ <tr>
1087
+ <td><a href="https://huggingface.co/google/gemma-2-9b-it" target="_blank">gemma-2-9b-it</a></td>
1088
+ <td>03.10</td>
1089
+ <td>19.16</td>
1090
+ <td>01.72</td>
1091
+ <td>24.35</td>
1092
+ <td>05.18</td>
1093
+ <td>36.96</td>
1094
+ <td>08.23</td>
1095
+ <td>43.57</td>
1096
+ <td>00.17</td>
1097
+ <td>09.14</td>
1098
+ <td>13.81</td>
1099
+ <td>59.87</td>
1100
+ </tr>
1101
+ <tr>
1102
+ <td><a href="meta-llama/Meta-Llama-3.1-8B-Instruct" target="_blank">Llama-3.1-8B-Instruct</a></td>
1103
+ <td>00.92</td>
1104
+ <td>14.19</td>
1105
+ <td>01.46</td>
1106
+ <td>23.82</td>
1107
+ <td>08.89</td>
1108
+ <td>33.08</td>
1109
+ <td>11.85</td>
1110
+ <td>35.51</td>
1111
+ <td>00.11</td>
1112
+ <td>06.02</td>
1113
+ <td>16.14</td>
1114
+ <td>44.08</td>
1115
+ </tr>
1116
+ <tr>
1117
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-9B" target="_blank">Atlas-Chat-9B</a></strong></td>
1118
+ <td><b>28.08</td>
1119
+ <td><b>50.48</td>
1120
+ <td><b>18.16</td>
1121
+ <td><b>43.91</td>
1122
+ <td><b>18.63</td>
1123
+ <td><b>47.53</td>
1124
+ <td><b>29.98</td>
1125
+ <td><b>58.26</td>
1126
+ <td><b>22.08</td>
1127
+ <td><b>34.17</td>
1128
+ <td><b>59.76</td>
1129
+ <td><b>81.89</td>
1130
+ </tr>
1131
+ <tr style="border-top: 4px solid;"></tr>
1132
+ <tr>
1133
+ <td><a href="https://huggingface.co/inceptionai/jais-family-30b-8k-chat" target="_blank">jais-family-30b-8k-chat</a></td>
1134
+ <td>01.10</td>
1135
+ <td>14.40</td>
1136
+ <td>01.67</td>
1137
+ <td>23.37</td>
1138
+ <td>08.52</td>
1139
+ <td>35.41</td>
1140
+ <td>13.71</td>
1141
+ <td>41.33</td>
1142
+ <td>00.05</td>
1143
+ <td>04.48</td>
1144
+ <td>00.46</td>
1145
+ <td>56.73</td>
1146
+ </tr>
1147
+ <tr>
1148
+ <td><a href="https://huggingface.co/google/gemma-2-27b-it" target="_blank">gemma-2-27b-it</a></td>
1149
+ <td>00.67</td>
1150
+ <td>13.04</td>
1151
+ <td>01.74</td>
1152
+ <td>24.63</td>
1153
+ <td>05.17</td>
1154
+ <td>37.08</td>
1155
+ <td>07.36</td>
1156
+ <td>42.49</td>
1157
+ <td>00.03</td>
1158
+ <td>04.94</td>
1159
+ <td>11.10</td>
1160
+ <td>57.59</td>
1161
+ </tr>
1162
+ <tr>
1163
+ <td><strong><a href="https://huggingface.co/MBZUAI-Paris/Atlas-Chat-27B" target="_blank">Atlas-Chat-27B</a></strong></td>
1164
+ <td><b>29.55</td>
1165
+ <td><b>51.74</td>
1166
+ <td><b>19.66</td>
1167
+ <td><b>45.65</td>
1168
+ <td><b>20.34</td>
1169
+ <td><b>49.19</td>
1170
+ <td><b>31.61</td>
1171
+ <td><b>59.37</td>
1172
+ <td><b>33.03</td>
1173
+ <td><b>40.95</td>
1174
+ <td><b>60.70</td>
1175
+ <td>73.00</td>
1176
+ </tr>
1177
+
1178
+
1179
+
1180
+ </table>
1181
 
1182
  ## Usage and Limitations
1183