Model,Avg. All,Avg. Timbre,Avg. Tone,Avg. Melody,Avg. Space,Avg. Time,Avg. Hallucination,Avg. Intricacy,Instrument Recognition,Singer Recognition,Gunshot Recognition,Bird Recognition,Animal Recognition,Transportation Recognition,Material Recognition,Scene Recognition,Hazard Recognition,Action Recognition,Eating Sound Recognition,Speech Sentiment Analysis,Meme Understanding,Music Sentiment Analysis,Music Genre Classification,Dance and Music Matching,Film and Music Matching,Music Score Matching,Audio 3D Angle Estimation,Audio Distance Estimation,Audio Time Estimation,Audio-Visual Synchronization,Action Sequencing,Hallucination Evaluation,Action Prediction,Action Tracing [Unified-IO-2 L](https://unified-io-2.allenai.org/),26.0,23.8,24.1,28.8,15.0,26.8,30.0,30.4,20.5,22.5,25.5,18.5,27.0,26.5,23.0,28.0,21.3,20.9,26.5,24.5,20.0,27.9,31.0,27.5,32.5,24.5,15.0,15.0,28.0,25.5,27.0,30.0,27.1,33.8 [Unified-IO-2 XL](https://unified-io-2.allenai.org/),26.3,24.3,23.2,27.8,22.5,25.3,31.5,34.8,20.0,23.5,24.0,20.5,27.5,26.0,27.5,30.0,19.4,19.9,26.5,23.0,25.0,26.9,30.5,27.0,31.5,22.5,30.0,15.0,26.5,25.5,24.0,31.5,35.7,33.8 [Unified-IO-2 XXL](https://unified-io-2.allenai.org/),27.2,26.3,22.7,26.4,32.5,26.8,24.5,33.8,29.5,24.0,23.5,29.0,23.5,25.5,30.5,26.5,23.1,27.0,25.5,23.0,20.0,23.9,31.5,27.5,24.5,23.5,50.0,15.0,28.0,25.0,27.5,24.5,33.2,34.4 [OneLLM](https://github.com/csuhan/OneLLM),27.4,25.0,25.5,21.5,37.5,29.3,25.5,38.4,26.0,21.5,27.0,26.0,22.0,20.0,29.5,24.5,26.9,23.0,29.5,26.0,20.0,20.8,23.5,26.5,18.5,18.0,45.0,30.0,31.5,29.5,27.0,25.5,41.7,34.9 [PandaGPT](https://panda-gpt.github.io/),26.7,23.5,23.2,27.6,45.0,23.8,28.0,23.9,20.0,21.5,23.0,17.5,26.0,26.5,28.0,27.0,23.1,21.4,24.5,23.5,20.0,21.6,28.0,27.0,32.5,26.0,45.0,45.0,18.5,26.0,27.0,28.0,19.6,28.2 [Video-llama](https://github.com/DAMO-NLP-SG/Video-LLaMA),26.1,25.5,22.3,24.4,30.0,26.2,25.0,30.7,22.5,24.5,27.0,26.5,27.0,23.5,28.0,25.0,25.0,26.0,25.5,23.0,15.0,25.8,24.0,20.0,25.0,28.0,45.0,15.0,28.5,23.5,26.5,25.0,28.6,32.8 [VideoLLaMA2](https://github.com/DAMO-NLP-SG/VideoLLaMA2),26.8,24.1,25.5,26.4,30.0,27.2,33.0,34.5,22.5,24.0,27.0,17.0,23.5,27.5,26.5,26.5,19.4,23.0,25.5,26.0,20.0,26.8,29.0,25.5,30.5,20.5,45.0,15.0,28.5,26.5,26.5,33.0,28.6,40.5 [AnyGPT](https://junzhan2000.github.io/AnyGPT.github.io/),26.1,24.6,25.0,26.4,27.5,29.2,29.0,25.7,22.5,28.5,28.0,17.5,24.0,25.5,23.0,28.0,25.9,20.4,27.5,25.5,20.0,23.4,29.5,25.5,26.0,26.0,40.0,15.0,30.5,28.0,29.0,29.0,21.1,30.3 [NExT-GPT](https://next-gpt.github.io/),25.5,23.2,20.9,27.8,30.0,28.8,28.5,23.6,21.0,23.5,25.5,21.5,25.5,25.5,21.0,24.0,19.4,23.0,24.0,21.5,15.0,23.7,26.0,28.0,31.0,28.0,45.0,15.0,31.5,24.0,31.0,28.5,20.6,26.7 [VITA](https://vita-home.github.io/),26.4,24.1,26.4,27.8,22.5,26.3,31.0,36.8,22.0,20.5,24.5,21.5,27.5,25.0,23.5,28.5,21.3,19.4,29.5,24.5,45.0,26.9,26.0,27.5,33.5,24.5,25.0,20.0,26.5,25.5,27.0,31.0,34.2,39.5 [Gemini 1.5 Flash](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf),27.8,27.2,25.0,28.8,30.0,25.3,28.5,31.2,24.5,24.0,23.5,17.0,32.5,26.0,22.5,29.5,34.3,48.0,21.5,23.5,40.0,21.3,31.0,27.5,32.5,28.0,30.0,30.0,27.5,23.5,25.0,28.5,27.6,34.9 [Gemini 1.5 Flash-8B](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf),26.8,25.1,24.5,28.9,27.5,27.5,29.0,30.2,16.5,22.5,24.0,19.0,28.0,26.5,27.0,29.0,26.9,32.7,24.5,24.5,25.0,25.9,33.0,27.5,32.0,24.5,40.0,15.0,31.0,25.5,26.0,29.0,25.6,34.9 [Gemini 1.5 Pro](https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf),30.8,30.8,31.4,31.3,37.5,27.7,20.5,33.0,33.0,26.0,29.0,25.0,25.5,26.0,29.5,30.0,38.0,57.7,22.5,29.5,50.0,25.4,42.5,28.0,28.5,29.0,35.0,40.0,30.0,24.5,28.5,20.5,32.2,33.8 [Reka Core](https://arxiv.org/abs/2404.12387),26.9,26.7,27.7,26.4,22.5,26.5,24.0,34.3,32.5,20.0,26.5,25.0,24.0,27.0,30.0,27.0,25.0,34.2,21.5,28.5,20.0,22.8,24.5,27.5,30.0,25.5,25.0,20.0,30.0,25.5,24.0,24.0,33.7,34.9 [Reka Flash](https://arxiv.org/abs/2404.12387),26.3,25.5,24.1,27.2,30.0,27.5,31.5,24.1,20.0,22.5,26.5,26.0,28.5,26.5,26.5,29.0,28.7,22.4,25.0,24.5,20.0,30.5,29.5,27.5,25.5,24.5,45.0,15.0,30.0,25.5,27.0,31.5,19.1,29.2 [Reka Edge](https://arxiv.org/abs/2404.12387),25.0,23.8,20.5,26.3,22.5,25.5,22.5,36.8,21.5,24.0,30.5,20.0,19.5,22.5,20.5,25.5,25.9,23.5,29.0,20.5,20.0,24.9,24.5,27.5,30.0,24.0,30.0,15.0,30.0,25.5,21.0,22.5,38.2,35.4 [GPT-4o visual caption](https://openai.com/index/hello-gpt-4o/),32.3,37.4,28.6,32.3,27.5,25.5,23.0,28.9,33.0,30.5,24.0,26.5,43.0,42.0,32.5,39.0,49.1,67.3,30.5,26.0,55.0,24.4,48.0,27.0,34.5,23.5,25.0,30.0,21.5,22.5,32.5,23.0,32.2,25.6 [GPT-4o audio caption](https://openai.com/index/hello-gpt-4o/),34.5,38.6,31.8,33.6,32.5,27.5,25.0,26.1,40.0,38.0,27.5,26.5,45.0,42.0,27.0,41.0,42.6,62.2,35.5,28.0,70.0,24.4,56.5,27.5,32.5,22.5,30.0,35.0,23.5,25.5,33.5,25.0,30.2,22.0