Category Model Elo # Matches Win vs. Reference (w/ # ratings) | |
Single Image GPT4V 1349 677 65.44% (n=136) | |
Single Image Human Verified Reference 1338 6480 --- | |
Single Image LLaVA-Plus 1187 812 30.15% (n=136) | |
Single Image LLaVA 13B 1091 5574 18.53% (n=475) | |
Single Image LlamaAdapter-v2 1066 5573 14.14% (n=488) | |
Single Image mPLUG-Owl 1025 5561 15.83% (n=480) | |
Single Image idefics9b 997 940 9.72% (n=144) | |
Single Image Lynx(8B) 990 929 11.43% (n=140) | |
Single Image InstructBLIP 964 5612 14.12% (n=503) | |
Single Image Otter 947 5597 7.01% (n=499) | |
Single Image Octopus V2 920 913 8.90% (n=146) | |
Single Image VisualGPT 911 5585 1.57% (n=510) | |
Single Image MiniGPT-4 900 5560 3.36% (n=506) | |
Single Image OpenFlamingo 845 5591 2.95% (n=509) | |
Single Image PandaGPT 13b 786 5573 2.70% (n=519) | |
Single Image MMGPT 718 5604 0.19% (n=527) | |