PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data Paper β’ 2502.14397 β’ Published 5 days ago β’ 33
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper β’ 2502.14786 β’ Published 5 days ago β’ 115
SmolVLM2 πΊ Smallest video LM ever π€π» Collection 11 items β’ Updated about 4 hours ago β’ 38
ImageRAG: Dynamic Image Retrieval for Reference-Guided Image Generation Paper β’ 2502.09411 β’ Published 12 days ago β’ 17
Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model Paper β’ 2502.10248 β’ Published 11 days ago β’ 50
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita π₯ 8 days ago β’ 89
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening Paper β’ 2502.12146 β’ Published 8 days ago β’ 15
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper β’ 2502.10458 β’ Published 13 days ago β’ 27
Step-Audio Collection Step-Audio model family, including Audio-Tokenizer, Audio-Chat and TTS β’ 3 items β’ Updated 8 days ago β’ 28
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper β’ 2502.02492 β’ Published 21 days ago β’ 56
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper β’ 2501.14677 β’ Published Jan 24 β’ 30
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models Paper β’ 2502.01639 β’ Published 22 days ago β’ 24
DynVFX: Augmenting Real Videos with Dynamic Content Paper β’ 2502.03621 β’ Published 20 days ago β’ 27
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper β’ 2502.02737 β’ Published 21 days ago β’ 192