Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -31,7 +31,7 @@ In the recent months, we've seen a lot of these "Tiny" models released, and some
|
|
31 |
## π Features
|
32 |
|
33 |
- **Battle Arena**: Pit two mystery models against each other and decide which pint-sized powerhouse reigns supreme.
|
34 |
-
- **Leaderboard**: Track the performance of different models over time.
|
35 |
- **Performance Chart**: Visualize model performance with interactive charts.
|
36 |
- **Privacy-Focused**: Uses local Ollama API, avoiding pricey commercial APIs and keeping data close to home.
|
37 |
- **Customizable**: Easy to add new models and prompts.
|
@@ -87,6 +87,24 @@ You can customize the arena by modifying the `arena_config.py` file:
|
|
87 |
|
88 |
The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
|
89 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
## π€ Models
|
91 |
|
92 |
The arena currently supports various compact models, including:
|
|
|
31 |
## π Features
|
32 |
|
33 |
- **Battle Arena**: Pit two mystery models against each other and decide which pint-sized powerhouse reigns supreme.
|
34 |
+
- **Leaderboard**: Track the performance of different models over time using an improved scoring system.
|
35 |
- **Performance Chart**: Visualize model performance with interactive charts.
|
36 |
- **Privacy-Focused**: Uses local Ollama API, avoiding pricey commercial APIs and keeping data close to home.
|
37 |
- **Customizable**: Easy to add new models and prompts.
|
|
|
87 |
|
88 |
The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
|
89 |
|
90 |
+
### Scoring System
|
91 |
+
|
92 |
+
We use a sophisticated scoring system to rank the models fairly:
|
93 |
+
|
94 |
+
1. We calculate a score for each model using the formula:
|
95 |
+
```
|
96 |
+
score = win_rate * (1 - 1 / (total_battles + 1))
|
97 |
+
```
|
98 |
+
This formula balances win rate with the number of battles, giving more weight to models that have participated in more battles.
|
99 |
+
|
100 |
+
2. We sort the results primarily by this new score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
|
101 |
+
|
102 |
+
3. The leaderboard displays this calculated score alongside wins, losses, and other statistics.
|
103 |
+
|
104 |
+
4. The ranking is based on this sophisticated score instead of just the number of wins.
|
105 |
+
|
106 |
+
This approach provides a fairer ranking system that considers both performance (win rate) and experience (total battles). Models that maintain a high win rate over many battles will be ranked higher than those with fewer battles or lower win rates.
|
107 |
+
|
108 |
## π€ Models
|
109 |
|
110 |
The arena currently supports various compact models, including:
|