zhiminy commited on
Commit
9a67f03
·
1 Parent(s): 065faaf

rename it FM4SE Leaderboard

Browse files
Files changed (1) hide show
  1. app.py +6 -7
app.py CHANGED
@@ -479,9 +479,8 @@ with gr.Blocks() as app:
479
  # Add title and description as a Markdown component
480
  leaderboard_intro = gr.Markdown(
481
  """
482
- # 🏆 Software Engineering (SE) Chatbot Leaderboard: Community-Driven Evaluation of Top SE Chatbots
483
-
484
- The SE Arena is an open-source platform designed to evaluate language models through human preference, fostering transparency and collaboration. Developed by researchers at [Software Analysis and Intelligence Lab (SAIL)](https://sail.cs.queensu.ca), the platform empowers the community to assess and compare the performance of leading foundation models in SE tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
485
  """,
486
  elem_classes="leaderboard-intro",
487
  )
@@ -522,12 +521,12 @@ with gr.Blocks() as app:
522
  # Add title and description as a Markdown component
523
  arena_intro = gr.Markdown(
524
  f"""
525
- # ⚔️ Software Engineering (SE) Arena: Explore and Test the Best SE Chatbots with Long-Context Interactions
526
 
527
  ## 📜How It Works
528
- - **Blind Comparison**: Submit a SE-related query to two anonymous chatbots randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.
529
- - **Interactive Voting**: Engage in multi-turn dialogues with both chatbots and compare their responses. You can continue the conversation until you confidently choose the better model.
530
- - **Fair Play Rules**: Votes are counted only if chatbot identities remain anonymous. Revealing a chatbot's identity disqualifies the session.
531
 
532
  **Note:** Due to budget constraints, responses that take longer than {TIMEOUT} seconds to generate will be discarded.
533
  """,
 
479
  # Add title and description as a Markdown component
480
  leaderboard_intro = gr.Markdown(
481
  """
482
+ # 🏆 FM4SE Leaderboard: Community-Driven Evaluation of Top Foundation Models (FMs) in Software Engineering (SE) Tasks
483
+ The SE Arena is an open-source platform designed to evaluate foundation models through human preference, fostering transparency and collaboration. Developed by researchers at [Software Analysis and Intelligence Lab (SAIL)](https://sail.cs.queensu.ca), the platform empowers the community to assess and compare the performance of leading FMs in SE tasks. For technical details, check out our [paper](https://arxiv.org/abs/2502.01860).
 
484
  """,
485
  elem_classes="leaderboard-intro",
486
  )
 
521
  # Add title and description as a Markdown component
522
  arena_intro = gr.Markdown(
523
  f"""
524
+ # ⚔️ SE Arena: Explore and Test Top FMs with SE Tasks
525
 
526
  ## 📜How It Works
527
+ - **Blind Comparison**: Submit a SE-related query to two anonymous FMs randomly selected from up to {len(available_models)} top models from OpenAI, Gemini, Grok, Claude, Deepseek, Qwen, Llama, Mistral, and others.
528
+ - **Interactive Voting**: Engage in multi-turn dialogues with both FMs and compare their responses. You can continue the conversation until you confidently choose the better model.
529
+ - **Fair Play Rules**: Votes are counted only if FM identities remain anonymous. Revealing a FM's identity disqualifies the session.
530
 
531
  **Note:** Due to budget constraints, responses that take longer than {TIMEOUT} seconds to generate will be discarded.
532
  """,