Spaces:

k-mktr
/

gpu-poor-llm-arena

Running

App Files Files Community

k-mktr commited on Oct 23, 2024

Commit

5282662

verified ·

1 Parent(s): 11c59c2

Update README.md

Browse files

Files changed (1) hide show

README.md +61 -23

README.md CHANGED Viewed

@@ -4,7 +4,7 @@ emoji: 🏆
 colorFrom: blue
 colorTo: purple
 sdk: gradio
-sdk_version: 5.1.0
 app_file: app.py
 pinned: false
 license: mit
@@ -87,36 +87,74 @@ You can customize the arena by modifying the `arena_config.py` file:
 The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
-### Scoring System
-We use a sophisticated scoring system to rank the models fairly:
-1. We calculate a score for each model using the formula:
-   ```
-   score = win_rate * (1 - 1 / (total_battles + 1))
-   ```
-   This formula balances win rate with the number of battles, giving more weight to models that have participated in more battles.
-2. We sort the results primarily by this new score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
-3. The leaderboard displays this calculated score alongside wins, losses, and other statistics.
-4. The ranking is based on this sophisticated score instead of just the number of wins.
-This approach provides a fairer ranking system that considers both performance (win rate) and experience (total battles). Models that maintain a high win rate over many battles will be ranked higher than those with fewer battles or lower win rates.
-## 🤖 Models
-The arena currently supports various compact models, including:
-- LLaMA 3.2 (1B and 3B versions)
-- LLaMA 3.1 (8B version)
-- Gemma 2 (2B and 9B versions)
-- Qwen 2.5 (0.5B, 1.5B, 3B, and 7B versions)
-- Mistral 0.3 (7B version)
-- Phi 3.5 (3.8B version)
-- Hermes 3 (8B version)
-- Aya 23 (8B version)
 ## 🤝 Contributing
@@ -131,4 +169,4 @@ This project is open-source and available under the MIT License
 - Thanks to the Ollama team for providing that amazing tool.
 - Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!
-Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! 🏆

 colorFrom: blue
 colorTo: purple
 sdk: gradio
+sdk_version: 5.3.0
 app_file: app.py
 pinned: false
 license: mit
 The leaderboard data is stored in `leaderboard.json`. This file is automatically updated after each battle.
+### Main Leaderboard Scoring System
+We use a scoring system to rank the models fairly. The score for each model is calculated using the following formula:
+```
+Score = Win Rate * (1 - 1 / (Total Battles + 1))
+```
+Let's break down this formula:
+1. **Win Rate**: This is the number of wins divided by the total number of battles. It ranges from 0 (no wins) to 1 (all wins).
+2. **1 - 1 / (Total Battles + 1)**: This factor adjusts the win rate based on the number of battles:
+   - We add 1 to the total battles to avoid division by zero and to ensure that even with just one battle, the score isn't discounted too heavily.
+   - As the number of battles increases, this factor approaches 1.
+   - For example:
+     - With 1 battle: 1 - 1/2 = 0.5
+     - With 10 battles: 1 - 1/11 ≈ 0.91
+     - With 100 battles: 1 - 1/101 ≈ 0.99
+3. **Purpose of this adjustment**:
+   - It gives more weight to models that have participated in more battles.
+   - A model with a high win rate but few battles will have a lower score than a model with the same win rate but more battles.
+   - This encourages models to participate in more battles to improve their score.
+4. **How it works in practice**:
+   - For a new model with just one battle, its score will be at most 50% of its win rate.
+   - As the model participates in more battles, its score will approach its actual win rate.
+   - This prevents models with very few battles from dominating the leaderboard based on lucky wins.
+In essence, this formula balances two factors:
+1. How well a model performs (win rate)
+2. How much experience it has (total battles)
+It ensures that the leaderboard favors models that consistently perform well over a larger number of battles, rather than those that might have a high win rate from just a few lucky encounters.
+We sort the results primarily by this calculated score, and secondarily by the total number of battles. This ensures that models with similar scores are ranked by their experience (number of battles).
+The leaderboard displays this calculated score alongside wins, losses, and other statistics.
+### ELO Leaderboard
+In addition to the main leaderboard, we also maintain an ELO-based leaderboard:
+- Models start with an initial ELO rating based on their size.
+- ELO ratings are updated after each battle, with adjustments made based on the size difference between models.
+- The ELO leaderboard provides an alternative perspective on model performance, taking into account the relative strengths of opponents.
+## 🤖 Models
+The arena currently supports the following compact models:
+- LLaMA 3.2 (1B, 3B, 8-bit)
+- LLaMA 3.1 (8B, 4-bit)
+- Gemma 2 (2B, 4-bit; 2B, 8-bit; 9B, 4-bit)
+- Qwen 2.5 (0.5B, 8-bit; 1.5B, 8-bit; 3B, 4-bit; 7B, 4-bit)
+- Mistral 0.3 (7B, 4-bit)
+- Phi 3.5 (3.8B, 4-bit)
+- Mistral Nemo (12B, 4-bit)
+- GLM4 (9B, 4-bit)
+- InternLM2 v2.5 (7B, 4-bit)
+- Falcon2 (11B, 4-bit)
+- StableLM2 (1.6B, 8-bit; 12B, 4-bit)
+- Yi v1.5 (6B, 4-bit; 9B, 4-bit)
+- Ministral (8B, 4-bit)
+- Dolphin 2.9.4 (8B, 4-bit)
+- Granite 3 Dense (2B, 8-bit; 8B, 4-bit)
+- Granite 3 MoE (1B, 8-bit; 3B, 4-bit)
 ## 🤝 Contributing
 - Thanks to the Ollama team for providing that amazing tool.
 - Shoutout to all the AI researchers and compact language models teams for making this frugal AI arena possible!
+Enjoy the battles in the GPU-Poor LLM Gladiator Arena! May the best compact model win! 🏆