AItool commited on
Commit
95249cc
·
verified ·
1 Parent(s): 5b6a646

Upload content.py

Browse files
Files changed (1) hide show
  1. content.py +7 -32
content.py CHANGED
@@ -1,57 +1,32 @@
1
- TITLE = '<h1 align="center" id="space-title">Open Multilingual LLM Evaluation Leaderboard</h1>'
2
 
3
  INTRO_TEXT = f"""
4
  ## About
5
-
6
  This leaderboard tracks progress and ranks performance of large language models (LLMs) developed for different languages,
7
  emphasizing on non-English languages to democratize benefits of LLMs to broader society.
8
- Our current leaderboard provides evaluation data for 29 languages, i.e.,
9
- Arabic, Armenian, Basque, Bengali, Catalan, Chinese, Croatian, Danish, Dutch,
10
- French, German, Gujarati, Hindi, Hungarian, Indonesian, Italian, Kannada, Malayalam,
11
- Marathi, Nepali, Portuguese, Romanian, Russian, Serbian, Slovak, Spanish, Swedish,
12
- Tamil, Telugu, Ukrainian, and Vietnamese, that will be expanded along the way.
13
- Both multilingual and language-specific LLMs are welcome in this leaderboard.
14
- We currently evaluate models over four benchmarks:
15
-
16
- - <a href="https://arxiv.org/abs/1803.05457" target="_blank"> AI2 Reasoning Challenge </a> (25-shot)
17
- - <a href="https://arxiv.org/abs/1905.07830" target="_blank"> HellaSwag </a> (0-shot)
18
- - <a href="https://arxiv.org/abs/2009.03300" target="_blank"> MMLU </a> (25-shot)
19
- - <a href="https://arxiv.org/abs/2109.07958" target="_blank"> TruthfulQA </a> (0-shot)
20
-
21
- The evaluation data was translated into these languages using ChatGPT (gpt-35-turbo).
22
-
23
  """
24
 
25
  HOW_TO = f"""
26
  ## How to list your model performance on this leaderboard:
27
-
28
- Run the evaluation of your model using this repo: <a href="https://github.com/nlp-uoregon/mlmm-evaluation" target="_blank">https://github.com/nlp-uoregon/mlmm-evaluation</a>.
29
-
30
  And then, push the evaluation log and make a pull request.
31
  """
32
 
33
  CREDIT = f"""
34
  ## Credit
35
-
36
  To make this website, we use the following resources:
37
-
38
- - Datasets (AI2_ARC, HellaSwag, MMLU, TruthfulQA)
39
- - Funding and GPU access (Adobe Research)
40
- - Evaluation code (EleutherAI's lm_evaluation_harness repo)
41
  - Leaderboard code (Huggingface4's open_llm_leaderboard repo)
42
-
43
  """
44
 
45
 
46
  CITATION = f"""
47
  ## Citation
48
-
49
  ```
50
-
51
  @misc{{lai2023openllmbenchmark,
52
- author = {{Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}},
53
- title={{Open Multilingual LLM Evaluation Leaderboard}},
54
- year={{2023}}
55
  }}
56
  ```
57
- """
 
1
+ TITLE = '<h1 align="center" id="space-title">Open Multilingual Basque LLM Evaluation Leaderboard</h1><img src="basque.JPG">'
2
 
3
  INTRO_TEXT = f"""
4
  ## About
 
5
  This leaderboard tracks progress and ranks performance of large language models (LLMs) developed for different languages,
6
  emphasizing on non-English languages to democratize benefits of LLMs to broader society.
7
+ Our current leaderboard provides evaluation data for Basque.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  """
9
 
10
  HOW_TO = f"""
11
  ## How to list your model performance on this leaderboard:
12
+ Run the evaluation of your model using this repo: <a href="https://github.com/webdevserv/mlmm_basque_evaluation" target="_blank">mlmm_basque_evaluation</a>.
 
 
13
  And then, push the evaluation log and make a pull request.
14
  """
15
 
16
  CREDIT = f"""
17
  ## Credit
 
18
  To make this website, we use the following resources:
 
 
 
 
19
  - Leaderboard code (Huggingface4's open_llm_leaderboard repo)
 
20
  """
21
 
22
 
23
  CITATION = f"""
24
  ## Citation
 
25
  ```
 
26
  @misc{{lai2023openllmbenchmark,
27
+ author = {{Idoia Lertxundi, thanks to Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}},
28
+ title={{Open Basque LLM Evaluation Leaderboard}},
29
+ year={{2024}}
30
  }}
31
  ```
32
+ """