Spaces:
Runtime error
Runtime error
Update src/about.py
Browse files- src/about.py +13 -3
src/about.py
CHANGED
@@ -37,7 +37,7 @@ TITLE = f"""
|
|
37 |
INTRODUCTION_TEXT = """
|
38 |
Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
|
39 |
|
40 |
-
Note: This is a demo version of the leaderboard.
|
41 |
linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
|
42 |
explaining the data and performance of relevent models.**
|
43 |
|
@@ -54,7 +54,13 @@ To reproduce our results, here is the commands you can run:
|
|
54 |
"""
|
55 |
|
56 |
EVALUATION_QUEUE_TEXT = """
|
57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
58 |
|
59 |
### 1) Make sure you can load your model and tokenizer using AutoClasses:
|
60 |
```python
|
@@ -79,7 +85,11 @@ When we add extra information about models to the leaderboard, it will be automa
|
|
79 |
## In case of model failure
|
80 |
If your model is displayed in the `FAILED` category, its execution stopped.
|
81 |
Make sure you have followed the above steps first.
|
82 |
-
|
|
|
|
|
|
|
|
|
83 |
"""
|
84 |
|
85 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|
|
|
37 |
INTRODUCTION_TEXT = """
|
38 |
Persian LLM Leaderboard is designed to be a challenging benchmark and provide a reliable evaluation of LLMs in Persian Language.
|
39 |
|
40 |
+
Note: This is a demo version of the leaderboard. Two new benchmarks are introduced: *PeKA* and *PersBETS*, challenging the native knowledge of the models along with
|
41 |
linguistic skills and their level of bias, ethics, and trustworthiness. **These datasets are not yet public, but they will be uploaded onto huggingface along with a detailed paper
|
42 |
explaining the data and performance of relevent models.**
|
43 |
|
|
|
54 |
"""
|
55 |
|
56 |
EVALUATION_QUEUE_TEXT = """
|
57 |
+
|
58 |
+
Right now, the models added **are not automatically evaluated**. We may support automatic evaluation in the future on our own clusters.
|
59 |
+
An evaluation framework will be available in the future to help reproduce the results.
|
60 |
+
|
61 |
+
## Don't forget to read the FAQ and the About tabs for more information!
|
62 |
+
|
63 |
+
## First steps before submitting a model
|
64 |
|
65 |
### 1) Make sure you can load your model and tokenizer using AutoClasses:
|
66 |
```python
|
|
|
85 |
## In case of model failure
|
86 |
If your model is displayed in the `FAILED` category, its execution stopped.
|
87 |
Make sure you have followed the above steps first.
|
88 |
+
|
89 |
+
### 5) Select the correct precision
|
90 |
+
Not all models are converted properly from `float16` to `bfloat16`, and selecting the wrong precision can sometimes cause evaluation error (as loading a `bf16` model in `fp16` can sometimes generate NaNs, depending on the weight range).
|
91 |
+
|
92 |
+
|
93 |
"""
|
94 |
|
95 |
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
|