Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -4,7 +4,7 @@ emoji: π
|
|
4 |
colorFrom: blue
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
-
sdk_version: 5.
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
@@ -87,36 +87,74 @@ You can customize the arena by modifying the `arena_config.py` file:
|
|
87 |
|
88 |
The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
|
89 |
|
90 |
-
### Scoring System
|
91 |
|
92 |
-
We use a
|
93 |
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
```
|
98 |
-
This formula balances win rate with the number of battles, giving more weight to models that have participated in more battles.
|
99 |
|
100 |
-
|
101 |
|
102 |
-
|
103 |
|
104 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
105 |
|
106 |
-
|
|
|
|
|
|
|
107 |
|
108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
|
110 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
|
112 |
-
|
113 |
-
|
114 |
-
-
|
115 |
-
-
|
116 |
-
-
|
117 |
-
-
|
118 |
-
-
|
119 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
120 |
|
121 |
## π€ Contributing
|
122 |
|
@@ -131,4 +169,4 @@ This project is open-source and available under the MIT License
|
|
131 |
- Thanks to the Ollama team for providing that amazing tool.
|
132 |
- Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!
|
133 |
|
134 |
-
Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! π
|
|
|
4 |
colorFrom: blue
|
5 |
colorTo: purple
|
6 |
sdk: gradio
|
7 |
+
sdk_version: 5.3.0
|
8 |
app_file: app.py
|
9 |
pinned: false
|
10 |
license: mit
|
|
|
87 |
|
88 |
The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
|
89 |
|
90 |
+
### Main Leaderboard Scoring System
|
91 |
|
92 |
+
We use a scoring system to rank the models fairly. The score for each model is calculated using the following formula:
|
93 |
|
94 |
+
```
|
95 |
+
Score = Win Rate * (1 - 1 / (Total Battles + 1))
|
96 |
+
```
|
|
|
|
|
97 |
|
98 |
+
Let's break down this formula:
|
99 |
|
100 |
+
1. **Win Rate**: This is the number of wins divided by the total number of battles. It ranges from 0 (no wins) to 1 (all wins).
|
101 |
|
102 |
+
2. **1 - 1 / (Total Battles + 1)**: This factor adjusts the win rate based on the number of battles:
|
103 |
+
- We add 1 to the total battles to avoid division by zero and to ensure that even with just one battle, the score isn't discounted too heavily.
|
104 |
+
- As the number of battles increases, this factor approaches 1.
|
105 |
+
- For example:
|
106 |
+
- With 1 battle: 1 - 1/2 = 0.5
|
107 |
+
- With 10 battles: 1 - 1/11 β 0.91
|
108 |
+
- With 100 battles: 1 - 1/101 β 0.99
|
109 |
|
110 |
+
3. **Purpose of this adjustment**:
|
111 |
+
- It gives more weight to models that have participated in more battles.
|
112 |
+
- A model with a high win rate but few battles will have a lower score than a model with the same win rate but more battles.
|
113 |
+
- This encourages models to participate in more battles to improve their score.
|
114 |
|
115 |
+
4. **How it works in practice**:
|
116 |
+
- For a new model with just one battle, its score will be at most 50% of its win rate.
|
117 |
+
- As the model participates in more battles, its score will approach its actual win rate.
|
118 |
+
- This prevents models with very few battles from dominating the leaderboard based on lucky wins.
|
119 |
+
|
120 |
+
In essence, this formula balances two factors:
|
121 |
+
1. How well a model performs (win rate)
|
122 |
+
2. How much experience it has (total battles)
|
123 |
+
|
124 |
+
It ensures that the leaderboard favors models that consistently perform well over a larger number of battles, rather than those that might have a high win rate from just a few lucky encounters.
|
125 |
+
|
126 |
+
We sort the results primarily by this calculated score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
|
127 |
|
128 |
+
The leaderboard displays this calculated score alongside wins, losses, and other statistics.
|
129 |
+
|
130 |
+
### ELO Leaderboard
|
131 |
+
|
132 |
+
In addition to the main leaderboard, we also maintain an ELO-based leaderboard:
|
133 |
+
|
134 |
+
- Models start with an initial ELO rating based on their size.
|
135 |
+
- ELO ratings are updated after each battle, with adjustments made based on the size difference between models.
|
136 |
+
- The ELO leaderboard provides an alternative perspective on model performance, taking into account the relative strengths of opponents.
|
137 |
+
|
138 |
+
## π€ Models
|
139 |
|
140 |
+
The arena currently supports the following compact models:
|
141 |
+
|
142 |
+
- LLaMA 3.2 (1B, 3B, 8-bit)
|
143 |
+
- LLaMA 3.1 (8B, 4-bit)
|
144 |
+
- Gemma 2 (2B, 4-bit; 2B, 8-bit; 9B, 4-bit)
|
145 |
+
- Qwen 2.5 (0.5B, 8-bit; 1.5B, 8-bit; 3B, 4-bit; 7B, 4-bit)
|
146 |
+
- Mistral 0.3 (7B, 4-bit)
|
147 |
+
- Phi 3.5 (3.8B, 4-bit)
|
148 |
+
- Mistral Nemo (12B, 4-bit)
|
149 |
+
- GLM4 (9B, 4-bit)
|
150 |
+
- InternLM2 v2.5 (7B, 4-bit)
|
151 |
+
- Falcon2 (11B, 4-bit)
|
152 |
+
- StableLM2 (1.6B, 8-bit; 12B, 4-bit)
|
153 |
+
- Yi v1.5 (6B, 4-bit; 9B, 4-bit)
|
154 |
+
- Ministral (8B, 4-bit)
|
155 |
+
- Dolphin 2.9.4 (8B, 4-bit)
|
156 |
+
- Granite 3 Dense (2B, 8-bit; 8B, 4-bit)
|
157 |
+
- Granite 3 MoE (1B, 8-bit; 3B, 4-bit)
|
158 |
|
159 |
## π€ Contributing
|
160 |
|
|
|
169 |
- Thanks to the Ollama team for providing that amazing tool.
|
170 |
- Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!
|
171 |
|
172 |
+
Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! π
|