Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents Paper • 2508.05954 • Published 10 days ago • 6
A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality Paper • 2507.07202 • Published Jul 9 • 22
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance Paper • 2505.21876 • Published May 28 • 9
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting Paper • 2504.15485 • Published Apr 21 • 5
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems Paper • 2504.09763 • Published Apr 14 • 13
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization Paper • 2504.08641 • Published Apr 11 • 7
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Paper • 2411.16657 • Published Nov 25, 2024 • 20
VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement Paper • 2411.15115 • Published Nov 22, 2024 • 9
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning Paper • 2309.15091 • Published Sep 26, 2023 • 33