Large Motion Video Autoencoding with Cross-modal Video VAE Paper • 2412.17805 • Published 2 days ago • 20
NILE: Internal Consistency Alignment in Large Language Models Paper • 2412.16686 • Published 4 days ago • 6
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching Paper • 2412.17153 • Published 3 days ago • 27
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners Paper • 2412.17256 • Published 3 days ago • 32
Diving into Self-Evolving Training for Multimodal Reasoning Paper • 2412.17451 • Published 2 days ago • 27
Offline Reinforcement Learning for LLM Multi-Step Reasoning Paper • 2412.16145 • Published 5 days ago • 30
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution Paper • 2412.15213 • Published 6 days ago • 25
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion Paper • 2412.14462 • Published 7 days ago • 15
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published 13 days ago • 19
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images Paper • 2412.03517 • Published 21 days ago • 18
PaliGemma 2: A Family of Versatile VLMs for Transfer Paper • 2412.03555 • Published 21 days ago • 118
Imagine360: Immersive 360 Video Generation from Perspective Anchor Paper • 2412.03552 • Published 21 days ago • 26
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published 26 days ago • 55
Open-Sora Plan: Open-Source Large Video Generation Model Paper • 2412.00131 • Published 27 days ago • 32
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? Paper • 2411.16489 • Published about 1 month ago • 40