kaz-llm-lb

Running

App Files Files Community

kz-transformers commited on Dec 11, 2024

Commit

2384501

verified ·

1 Parent(s): e74bbec

Update src/display/about.py

Browse files

Files changed (1) hide show

src/display/about.py +14 -43

src/display/about.py CHANGED Viewed

@@ -1,6 +1,6 @@
 from src.display.utils import ModelType
-TITLE = """<h1 style="text-align:left;float:left; id="space-title">🤗 Small Shlepa LLM Leaderboard</h1> <h3 style="text-align:left;float:left;> Track, rank and evaluate open LLMs and chatbots </h3>"""
 INTRODUCTION_TEXT = """
 """
@@ -16,18 +16,18 @@ icons = f"""
 LLM_BENCHMARKS_TEXT = """
 ## En:
-Small Shlepa is a benchmark for LLM with multiple-choice tasks on the following topics:
-- Complex interdisciplinary questions (MMLUpro-ru)
-- Laws of the Russian Federation (lawmc)
-- Popular music (musicmc)
-- Books (bookmc)
-- Movies (moviemc)
-Each task contains 12 answer choices, mmlupro-ru has 10.
 ## Instructions for Use
 ### Installation
 To install the necessary library, run the following command:
 ```bash
-pip install git+https://github.com/VikhrModels/lm_eval_mc.git --upgrade --force-reinstall --no-deps
 ```
 ### Execution
 To run the benchmark, use the following command:
@@ -35,10 +35,10 @@ To run the benchmark, use the following command:
 !lm_eval \
     --model hf \
     --model_args pretrained={hf/model},dtype=float16 \
-    --batch_size 8 \
     --apply_chat_template \
     --num_fewshot 0 \
-    --tasks musicmc,moviemc,bookmc,lawmc,mmluproru \
     --output output
 ```
 ### Results
@@ -47,38 +47,9 @@ After executing the above command, a JSON file will be created in the `output` d
 If cheating or attempts to modify the output file are detected, we reserve the right to delete your submission.
 Thank you for participating!
-## Ru:
-Маленький Шлепа это бенчмарк для LLM с задачами множественного выбора (multichoice) по следующим темам:
-- Сложные междисциплинные вопросы (MMLUpro-ru)
-- Законы Российской Федерации (lawmc)
-- Популярная музыка (musicmc)
-- Книги (bookmc)
-- Фильмы (moviemc)
-Каждая задача содержит 12 вариантов ответа, mmlupro-ru из 10.
-## Инструкция по использованию
-### Установка
-Для установки необходимой библиотеки выполните следующую команду:
-```bash
-pip install git+https://github.com/VikhrModels/lm_eval_mc.git --upgrade --force-reinstall --no-deps
-```
-### Запуск
-Для запуска бенча используйте следующую команду:
-```bash
-!lm_eval \
-    --model hf \
-    --model_args pretrained={hf/model},dtype=float16 \
-    --batch_size 8 \
-    --apply_chat_template \
-    --num_fewshot 0 \
-    --tasks musicmc,moviemc,bookmc,lawmc,mmluproru \
-    --output output
-```
-### Результаты
-После выполнения команды выше, в каталоге `output` будет создан файл в формате json, его необходимо прикрепить. Этот файл содержит результаты выполнения задач и описание сессии, его **нельзя модифицировать**.
-## Политика против читерства
-При обнаружении читерства или попыток модификации выходного файла, мы оставляем за собой право удалить ваш сабмишен.
-Спасибо за участие!
 Cite: @misc{aleks2024vikhr,
     title={Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian},

 from src.display.utils import ModelType
+TITLE = """<h1 style="text-align:left;float:left; id="space-title">🤗 Kaz LLM Leaderboard</h1> <h3 style="text-align:left;float:left;> Track, rank and evaluate open LLMs and chatbots </h3>"""
 INTRODUCTION_TEXT = """
 """
 LLM_BENCHMARKS_TEXT = """
 ## En:
+Kaz LLM is a benchmark for LLM with multiple-choice tasks on the following topics:
+- mmlu-translated-kk
+- gsm8k-kk-translated
+- kazakh-unified-national-testing-mc
+- kazakh-constitution-mc
+- kazakh-dastur-mc
+Each task contains 4 answer choices, mmlu-translated-kk has ??.
 ## Instructions for Use
 ### Installation
 To install the necessary library, run the following command:
 ```bash
+pip install git+https://github.com/horde-research/kaz-llm-eval-lb.git --upgrade --force-reinstall --no-deps
 ```
 ### Execution
 To run the benchmark, use the following command:
 !lm_eval \
     --model hf \
     --model_args pretrained={hf/model},dtype=float16 \
+    --batch_size 1 \
     --apply_chat_template \
     --num_fewshot 0 \
+    --tasks kazakh-dastur-mc \
     --output output
 ```
 ### Results
 If cheating or attempts to modify the output file are detected, we reserve the right to delete your submission.
 Thank you for participating!
+## KZ:
+to be filled
 Cite: @misc{aleks2024vikhr,
     title={Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian},