k-mktr commited on
Commit
0223ad5
β€’
1 Parent(s): e19f726

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -31,7 +31,7 @@ In the recent months, we've seen a lot of these "Tiny" models released, and some
31
  ## 🌟 Features
32
 
33
  - **Battle Arena**: Pit two mystery models against each other and decide which pint-sized powerhouse reigns supreme.
34
- - **Leaderboard**: Track the performance of different models over time.
35
  - **Performance Chart**: Visualize model performance with interactive charts.
36
  - **Privacy-Focused**: Uses local Ollama API, avoiding pricey commercial APIs and keeping data close to home.
37
  - **Customizable**: Easy to add new models and prompts.
@@ -87,6 +87,24 @@ You can customize the arena by modifying the `arena_config.py` file:
87
 
88
  The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
  ## πŸ€– Models
91
 
92
  The arena currently supports various compact models, including:
 
31
  ## 🌟 Features
32
 
33
  - **Battle Arena**: Pit two mystery models against each other and decide which pint-sized powerhouse reigns supreme.
34
+ - **Leaderboard**: Track the performance of different models over time using an improved scoring system.
35
  - **Performance Chart**: Visualize model performance with interactive charts.
36
  - **Privacy-Focused**: Uses local Ollama API, avoiding pricey commercial APIs and keeping data close to home.
37
  - **Customizable**: Easy to add new models and prompts.
 
87
 
88
  The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
89
 
90
+ ### Scoring System
91
+
92
+ We use a sophisticated scoring system to rank the models fairly:
93
+
94
+ 1. We calculate a score for each model using the formula:
95
+ ```
96
+ score = win_rate * (1 - 1 / (total_battles + 1))
97
+ ```
98
+ This formula balances win rate with the number of battles, giving more weight to models that have participated in more battles.
99
+
100
+ 2. We sort the results primarily by this new score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
101
+
102
+ 3. The leaderboard displays this calculated score alongside wins, losses, and other statistics.
103
+
104
+ 4. The ranking is based on this sophisticated score instead of just the number of wins.
105
+
106
+ This approach provides a fairer ranking system that considers both performance (win rate) and experience (total battles). Models that maintain a high win rate over many battles will be ranked higher than those with fewer battles or lower win rates.
107
+
108
  ## πŸ€– Models
109
 
110
  The arena currently supports various compact models, including: