SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models Paper • 2412.07755 • Published Dec 10, 2024 • 1
COLA: How to adapt vision-language models to Compose Objects Localized with Attributes? Paper • 2305.03689 • Published May 5, 2023 • 2