VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping Paper • 2412.11279 • Published 28 days ago • 12
Causal Diffusion Transformers for Generative Modeling Paper • 2412.12095 • Published 27 days ago • 23
EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM Paper • 2412.09618 • Published Dec 12, 2024 • 21
Enhancing Vision-Language Model with Unmasked Token Alignment Paper • 2405.19009 • Published May 29, 2024 • 1
MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment Paper • 2406.19736 • Published Jun 28, 2024 • 2
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 124
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28, 2024 • 84