Visual-Riddles-Leaderboard / Visual-Riddles-Leaderboard.tsv
nitzanguetta's picture
Upload Visual-Riddles-Leaderboard.tsv
4dd5ec2 verified
raw
history blame
940 Bytes
Model Open Ended VQA: % Human Rating Multiple Choice VQA: % Accuracy Hints-Multiple Choice VQA: % Accuracy Attributions-Multiple Choice VQA: % Accuracy Refernce Based-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Refernce Free-Automatic Evaluation: Accuracy of Judge Prediction Compared to Human Ratings Automatic Evaluation: % Auto-Rater Ratings Hints-Automatic Evaluation: % Auto-Rater Ratings Attributions-Automatic Evaluation: % Auto-Rater Ratings
Humans 82 78
Gemini Pro 1.5 40 38 66 72 87 52 53 62 29
Gemini Pro Vision 30 41 62 75 38 34 47
GPT4 34 45 69 82 86 51 38 61 25
LlaVA-1.6-34B 15 24 30 76 43 21 16
LlaVA-1.5-7B 13 17 29 70 35 19 30
InstructBlip 13 20 28
Gemini Pro 1.5 Caption _ Gemini Pro 1.5 23
Human (Oracle) Caption _ Gemini Pro 1.5 50
Claude 3.5 Sonnet 46 45 39
GPT4o 55 83 50
Qwen-VL-Max 35 53 26
Molmo-7B 34 42 36