Spaces:

OALL
/

Open-Arabic-LLM-Leaderboard

Running on CPU Upgrade

App Files Files Community

Ali-C137 commited on May 14

Commit

0f8b93f

•

1 Parent(s): 3f72d50

Update src/about.py

Browse files

adding `0-shots` and `loglikelihood_acc_norm` as used metric in the "How it works" section + update of TITLE & BOTTOM

Files changed (1) hide show

src/about.py +9 -4

src/about.py CHANGED Viewed

@@ -35,14 +35,14 @@ NUM_FEWSHOT = 0 # Change with your few shot
 # Your leaderboard name
-TITLE = """<h1 align="center" id="space-title">Open Arabic LLM Leaderboard</h1>"""
-# TITLE = """<img src="./OALL-logo-nobg.png" style="width:30%;display:block;margin-left:auto;margin-right:auto">"""
-# BOTTOM_LOGO = """<img src="https://huggingface.co/spaces/OALL/Open-Arabic-LLM-Leaderboard/blob/main/assets/footer_logo.png" style="width:50%;display:block;margin-left:auto;margin-right:auto">"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
-🚀 The Open Arabic LLM Leaderboard : Evaluate and compare the performance of Arabic Large Language Models (LLMs).
 When you submit a model on the "Submit here!" page, it is automatically evaluated on a set of benchmarks.
@@ -90,6 +90,11 @@ And here find all the translated benchmarks provided by the Language evaluation
 - `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
 Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
 GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.

 # Your leaderboard name
+# TITLE = """<h1 align="center" id="space-title">Open Arabic LLM Leaderboard</h1>"""
+TITLE = """<img src="https://raw.githubusercontent.com/alielfilali01/OALL-assets/main/TITLE.png" style="width:30%;display:block;margin-left:auto;margin-right:auto;border-radius:15px;">"""
+BOTTOM_LOGO = """<img src="https://raw.githubusercontent.com/alielfilali01/OALL-assets/main/BOTTOM.png" style="width:50%;display:block;margin-left:auto;margin-right:auto;border-radius:15px;">"""
 # What does your leaderboard evaluate?
 INTRODUCTION_TEXT = """
+🌴 The Open Arabic LLM Leaderboard : Evaluate and compare the performance of Arabic Large Language Models (LLMs).
 When you submit a model on the "Submit here!" page, it is automatically evaluated on a set of benchmarks.
 - `Arabic-MMLU`, `Arabic-EXAMS`, `Arabic-ARC-Challenge`, `Arabic-ARC-Easy`, `Arabic-BOOLQ`, `Arabic-COPA`, `Arabic-HELLASWAG`, `Arabic-OPENBOOK-QA`, `Arabic-PIQA`, `Arabic-RACE`, `Arabic-SCIQ`, `Arabic-TOXIGEN`. All part of the extended version of the AlGhafa benchmark (AlGhafa-T version)
+To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
+Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
 Please, consider reaching out to us through the discussions tab if you are working on benchmarks for Arabic LLMs and willing to see them on this leaderboard as well. Your benchmark might change the whole game for Arabic models !
 GPUs are provided by __[Technology Innovation Institute (TII)](https://www.tii.ae/)__ for the evaluations.