OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published 13 days ago • 10
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Paper • 2405.05949 • Published May 9 • 2
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published 13 days ago • 10
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 84
StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text Paper • 2403.14773 • Published Mar 21 • 10
VCoder: Versatile Vision Encoders for Multimodal Large Language Models Paper • 2312.14233 • Published Dec 21, 2023 • 16
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models Paper • 2312.14091 • Published Dec 21, 2023 • 15
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 14
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 14
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models Paper • 2312.04410 • Published Dec 7, 2023 • 14
HiFi Tuner: High-Fidelity Subject-Driven Fine-Tuning for Diffusion Models Paper • 2312.00079 • Published Nov 30, 2023 • 14
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model Paper • 2211.08332 • Published Nov 15, 2022
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators Paper • 2303.13439 • Published Mar 23, 2023 • 4
OneFormer: One Transformer to Rule Universal Image Segmentation Paper • 2211.06220 • Published Nov 10, 2022