Ali-C137 commited on
Commit
0f8b93f
β€’
1 Parent(s): 3f72d50

Update src/about.py

Browse files

adding `0-shots` and `loglikelihood_acc_norm` as used metric in the "How it works" section + update of TITLE & BOTTOM

Files changed (1) hide show
  1. src/about.py +9 -4
src/about.py CHANGED
@@ -35,14 +35,14 @@ NUM_FEWSHOT = 0 # Change with your few shot
35
 
36
 
37
  # Your leaderboard name
38
- TITLE = """<h1 align="center" id="space-title">Open Arabic LLM Leaderboard</h1>"""
39
- # TITLE = """<img src="./OALL-logo-nobg.png" style="width:30%;display:block;margin-left:auto;margin-right:auto">"""
40
 
41
- # BOTTOM_LOGO = """<img src="https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard/blob/main/assets/footer_logo.png" style="width:50%;display:block;margin-left:auto;margin-right:auto">"""
42
 
43
  # What does your leaderboard evaluate?
44
  INTRODUCTION_TEXT = """
45
- πŸš€ The Open Arabic LLM Leaderboard : Evaluate and compare the performance of Arabic Large Language Models (LLMs).
46
 
47
 
48
  When you submit a model on the "Submit here!" page, it is automatically evaluated on a set of benchmarks.
@@ -90,6 +90,11 @@ And here find all the translated benchmarks provided by the Language evaluation
90
 
91
  - `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
92
 
 
 
 
 
 
93
  Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
94
 
95
  GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.
 
35
 
36
 
37
  # Your leaderboard name
38
+ # TITLE = """<h1 align="center" id="space-title">Open Arabic LLM Leaderboard</h1>"""
39
+ TITLE = """<img src="https://raw.githubusercontent.com/alielfilali01/OALL-assets/main/TITLE.png" style="width:30%;display:block;margin-left:auto;margin-right:auto;border-radius:15px;">"""
40
 
41
+ BOTTOM_LOGO = """<img src="https://raw.githubusercontent.com/alielfilali01/OALL-assets/main/BOTTOM.png" style="width:50%;display:block;margin-left:auto;margin-right:auto;border-radius:15px;">"""
42
 
43
  # What does your leaderboard evaluate?
44
  INTRODUCTION_TEXT = """
45
+ 🌴 The Open Arabic LLM Leaderboard : Evaluate and compare the performance of Arabic Large Language Models (LLMs).
46
 
47
 
48
  When you submit a model on the "Submit here!" page, it is automatically evaluated on a set of benchmarks.
 
90
 
91
  - `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
92
 
93
+
94
+ To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
95
+ Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
96
+
97
+
98
  Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
99
 
100
  GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.