Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2412.14169

NOVA: Autoregressive Video Generation without Vector Quantization

BAAI/nova-d48w768-sdxl1024

Text-to-Image • Updated 4 days ago • 108 • 2
BAAI/nova-d48w1024-osp480

Text-to-Video • Updated 4 days ago • 245 • 6
BAAI/nova-d48w1024-sdxl1024

Text-to-Image • Updated 4 days ago • 17 • 1
BAAI/nova-d48w1536-sdxl1024

Text-to-Image • Updated 4 days ago • 25 • 5

Paper - Multimodal

Paper related to Multimodal Model - Research for a : Modular, Multimodal, Multi-Stream, Mixture of Expert, Universal Transformer, Matryoshka embedding

about 16 hours ago

Flowing from Words to Pixels: A Framework for Cross-Modality Evolution

Paper • 2412.15213 • Published 6 days ago • 25
No More Adam: Learning Rate Scaling at Initialization is All You Need

Paper • 2412.11768 • Published 9 days ago • 40
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 7 days ago • 103
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 7 days ago • 13

about 6 hours ago

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18 • 15
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18 • 8
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19 • 13

Video Generation

Video Generation

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published 10 days ago • 5
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Paper • 2412.09856 • Published 12 days ago • 9
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published 13 days ago • 7
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published 20 days ago • 9

Unified model that generate Text, Image, Video

TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation

Paper • 2412.03069 • Published 21 days ago • 30
Are Emergent Abilities of Large Language Models a Mirage?

Paper • 2304.15004 • Published Apr 28, 2023 • 6
Scaling Image Tokenizers with Grouped Spherical Quantization

Paper • 2412.02632 • Published 22 days ago • 10
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Paper • 2410.13848 • Published Oct 17 • 31

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Paper • 2408.16767 • Published Aug 29 • 30
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation

Paper • 2411.16657 • Published 30 days ago • 17
Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published 7 days ago • 13
Progressive Multimodal Reasoning via Active Retrieval

Paper • 2412.14835 • Published 6 days ago • 66

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs