Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos Paper β’ 2501.04001 β’ Published 5 days ago β’ 36
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper β’ 2501.05441 β’ Published 3 days ago β’ 55
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 4 days ago β’ 190
ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation Paper β’ 2204.12484 β’ Published Apr 26, 2022 β’ 2
TransPixar: Advancing Text-to-Video Generation with Transparency Paper β’ 2501.03006 β’ Published 6 days ago β’ 19
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper β’ 2501.03847 β’ Published 5 days ago β’ 18