Collections
Discover the best community collections!
Collections including paper arxiv:2412.14169
-
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
Paper • 2412.15213 • Published • 25 -
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper • 2412.11768 • Published • 40 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 103 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 13
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper • 2401.09985 • Published • 15 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper • 2401.09962 • Published • 8 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper • 2401.10404 • Published • 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper • 2401.10822 • Published • 13
-
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Paper • 2412.11100 • Published • 5 -
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Paper • 2412.09856 • Published • 9 -
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation
Paper • 2412.09349 • Published • 7 -
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation
Paper • 2412.04448 • Published • 9
-
TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation
Paper • 2412.03069 • Published • 30 -
Are Emergent Abilities of Large Language Models a Mirage?
Paper • 2304.15004 • Published • 6 -
Scaling Image Tokenizers with Grouped Spherical Quantization
Paper • 2412.02632 • Published • 10 -
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Paper • 2410.13848 • Published • 31
-
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model
Paper • 2408.16767 • Published • 30 -
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation
Paper • 2411.16657 • Published • 17 -
Autoregressive Video Generation without Vector Quantization
Paper • 2412.14169 • Published • 13 -
Progressive Multimodal Reasoning via Active Retrieval
Paper • 2412.14835 • Published • 66