rename it FM4SE Leaderboard
Browse files
app.py
CHANGED
@@ -479,9 +479,8 @@ with gr.Blocks() as app:
|
|
479 |
# Add title and description as a Markdown component
|
480 |
leaderboard_intro = gr.Markdown(
|
481 |
"""
|
482 |
-
# 🏆
|
483 |
-
|
484 |
-
The SE Arena is an open-source platform designed to evaluate language models through human preference, fostering transparency and collaboration. Developed by researchers at [Software Analysis and Intelligence Lab (SAIL)](https://sail.cs.queensu.ca), the platform empowers the community to assess and compare the performance of leading foundation models in SE tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
|
485 |
""",
|
486 |
elem_classes="leaderboard-intro",
|
487 |
)
|
@@ -522,12 +521,12 @@ with gr.Blocks() as app:
|
|
522 |
# Add title and description as a Markdown component
|
523 |
arena_intro = gr.Markdown(
|
524 |
f"""
|
525 |
-
# ⚔️
|
526 |
|
527 |
## 📜How It Works
|
528 |
-
- **Blind Comparison**: Submit a SE-related query to two anonymous
|
529 |
-
- **Interactive Voting**: Engage in multi-turn dialogues with both
|
530 |
-
- **Fair Play Rules**: Votes are counted only if
|
531 |
|
532 |
**Note:** Due to budget constraints, responses that take longer than {TIMEOUT} seconds to generate will be discarded.
|
533 |
""",
|
|
|
479 |
# Add title and description as a Markdown component
|
480 |
leaderboard_intro = gr.Markdown(
|
481 |
"""
|
482 |
+
# 🏆 FM4SE Leaderboard: Community-Driven Evaluation of Top Foundation Models (FMs) in Software Engineering (SE) Tasks
|
483 |
+
The SE Arena is an open-source platform designed to evaluate foundation models through human preference, fostering transparency and collaboration. Developed by researchers at [Software Analysis and Intelligence Lab (SAIL)](https://sail.cs.queensu.ca), the platform empowers the community to assess and compare the performance of leading FMs in SE tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
|
|
|
484 |
""",
|
485 |
elem_classes="leaderboard-intro",
|
486 |
)
|
|
|
521 |
# Add title and description as a Markdown component
|
522 |
arena_intro = gr.Markdown(
|
523 |
f"""
|
524 |
+
# ⚔️ SE Arena: Explore and Test Top FMs with SE Tasks
|
525 |
|
526 |
## 📜How It Works
|
527 |
+
- **Blind Comparison**: Submit a SE-related query to two anonymous FMs randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.
|
528 |
+
- **Interactive Voting**: Engage in multi-turn dialogues with both FMs and compare their responses. You can continue the conversation until you confidently choose the better model.
|
529 |
+
- **Fair Play Rules**: Votes are counted only if FM identities remain anonymous. Revealing a FM's identity disqualifies the session.
|
530 |
|
531 |
**Note:** Due to budget constraints, responses that take longer than {TIMEOUT} seconds to generate will be discarded.
|
532 |
""",
|