Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated Feb 3 • 6.12k • 51
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 36
Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated Feb 3 • 6.12k • 51
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 99
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 99
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 58
Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language Models Paper • 2209.07511 • Published Sep 15, 2022