InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity Paper • 2503.16418 • Published 5 days ago • 32
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published 11 days ago • 75
EasyControl: Adding Efficient and Flexible Control for Diffusion Transformer Paper • 2503.07027 • Published 16 days ago • 26
VideoPainter: Any-length Video Inpainting and Editing with Plug-and-Play Context Control Paper • 2503.05639 • Published 18 days ago • 22
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published 20 days ago • 84