Vision Language Models: 2025 Update This collection includes all the models, datasets and Spaces mentioned in the blog Vision Language Models: 2025 Update Qwen/Qwen2.5-Omni-7B Any-to-Any • Updated Apr 30 • 396k • 1.66k Running 321 321 Qwen2.5 Omni 7B Demo 🏆 Generate text and speech responses from text, images, or audio input Qwen2.5-Omni Technical Report Paper • 2503.20215 • Published Mar 26 • 158 openbmb/MiniCPM-o-2_6 Any-to-Any • Updated 2 days ago • 163k • 1.17k
Running 321 321 Qwen2.5 Omni 7B Demo 🏆 Generate text and speech responses from text, images, or audio input
Running on Zero 2 VQA Autonomous Driving SmolVLM2 🌖 Visual Question Answering - Autonomous Driving - SmolVLM2
sergiopaniego/gemma-3-4b-pt-object-detection-loc-tokens Image-Text-to-Text • Updated about 21 hours ago