Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update src/about.py
Browse filesadding `0-shots` and `loglikelihood_acc_norm` as used metric in the "How it works" section + update of TITLE & BOTTOM
- src/about.py +9 -4
src/about.py
CHANGED
@@ -35,14 +35,14 @@ NUM_FEWSHOT = 0 # Change with your few shot
|
|
35 |
|
36 |
|
37 |
# Your leaderboard name
|
38 |
-
TITLE = """<h1 align="center" id="space-title">Open Arabic LLM Leaderboard</h1>"""
|
39 |
-
|
40 |
|
41 |
-
|
42 |
|
43 |
# What does your leaderboard evaluate?
|
44 |
INTRODUCTION_TEXT = """
|
45 |
-
|
46 |
|
47 |
|
48 |
When you submit a model on the "Submit here!" page, it is automatically evaluated on a set of benchmarks.
|
@@ -90,6 +90,11 @@ And here find all the translated benchmarks provided by the Language evaluation
|
|
90 |
|
91 |
- `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
|
92 |
|
|
|
|
|
|
|
|
|
|
|
93 |
Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
|
94 |
|
95 |
GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.
|
|
|
35 |
|
36 |
|
37 |
# Your leaderboard name
|
38 |
+
# TITLE = """<h1 align="center" id="space-title">Open Arabic LLM Leaderboard</h1>"""
|
39 |
+
TITLE = """<img src="https://raw.githubusercontent.com/alielfilali01/OALL-assets/main/TITLE.png" style="width:30%;display:block;margin-left:auto;margin-right:auto;border-radius:15px;">"""
|
40 |
|
41 |
+
BOTTOM_LOGO = """<img src="https://raw.githubusercontent.com/alielfilali01/OALL-assets/main/BOTTOM.png" style="width:50%;display:block;margin-left:auto;margin-right:auto;border-radius:15px;">"""
|
42 |
|
43 |
# What does your leaderboard evaluate?
|
44 |
INTRODUCTION_TEXT = """
|
45 |
+
π΄ The Open Arabic LLM Leaderboard : Evaluate and compare the performance of Arabic Large Language Models (LLMs).
|
46 |
|
47 |
|
48 |
When you submit a model on the "Submit here!" page, it is automatically evaluated on a set of benchmarks.
|
|
|
90 |
|
91 |
- `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
|
92 |
|
93 |
+
|
94 |
+
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
95 |
+
Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
|
96 |
+
|
97 |
+
|
98 |
Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
|
99 |
|
100 |
GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.
|