Spaces:

TIGER-Lab
/

MEGA-Bench

Running

cccjc commited on Apr 19

Commit

2c937b6

1 Parent(s): e5b03d3

update

Files changed (1) hide show

constants.py CHANGED Viewed

@@ -32,7 +32,7 @@ We aim to provide cost-effective and accurate evaluation for multimodal models,
 ## 📊🔍 Results & Takeaways from Evaluating Top Models
-### ️‍🔥📝 2025.01
 - **Gemini 2.0 Experimental (1206)** and **Gemini 2.0 Flash Experimental** outperform **GPT-4o** and **Claude 3.5 Sonnet**.
 - We add **Grok-2-vision-1212** to the single-image leaderboard. The model seems to use a lot of tokens per image, and cannot run many of our multi-image and video tasks.
@@ -279,6 +279,7 @@ BASE_MODEL_GROUPS = {
         "Qwen2.5-VL-72B",
         "Gemma_3_27B_IT",
         "Gemini_2.5_pro_0325",
         "InternVL3_78B",
     ],
     "Efficiency Models": [
@@ -338,6 +339,7 @@ BASE_MODEL_GROUPS = {
         "MiniMax-VL-01",
         "Qwen2.5-VL-72B",
         "Gemma_3_27B_IT",
         "InternVL3_78B",
     ],
     "Open-source Efficiency Models": [

 ## 📊🔍 Results & Takeaways from Evaluating Top Models
+### 📝 2025.01
 - **Gemini 2.0 Experimental (1206)** and **Gemini 2.0 Flash Experimental** outperform **GPT-4o** and **Claude 3.5 Sonnet**.
 - We add **Grok-2-vision-1212** to the single-image leaderboard. The model seems to use a lot of tokens per image, and cannot run many of our multi-image and video tasks.
         "Qwen2.5-VL-72B",
         "Gemma_3_27B_IT",
         "Gemini_2.5_pro_0325",
+        "InternVL3_38B",
         "InternVL3_78B",
     ],
     "Efficiency Models": [
         "MiniMax-VL-01",
         "Qwen2.5-VL-72B",
         "Gemma_3_27B_IT",
+        "InternVL3_38B",
         "InternVL3_78B",
     ],
     "Open-source Efficiency Models": [