Running 116 116 Open-LLM performances are plateauing, let’s make the leaderboard steep again 🏔 Update leaderboard for fair model evaluation