LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 41
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? Paper • 2311.00047 • Published Oct 31, 2023 • 8
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models Paper • 2310.08825 • Published Oct 13, 2023 • 1
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving Paper • 2311.05332 • Published Nov 9, 2023 • 9
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents Paper • 2311.05437 • Published Nov 9, 2023 • 48