LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model Paper • 2404.01331 • Published Mar 29 • 24
BRAVE: Broadening the visual encoding of vision-language models Paper • 2404.07204 • Published Apr 10 • 17
Salesforce/xgen-mm-phi3-mini-instruct-r-v1 Image-Text-to-Text • Updated about 1 month ago • 56.2k • 184