-
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Paper ā¢ 2406.06525 ā¢ Published ā¢ 64 -
Husky: A Unified, Open-Source Language Agent for Multi-Step Reasoning
Paper ā¢ 2406.06469 ā¢ Published ā¢ 23 -
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
Paper ā¢ 2406.04271 ā¢ Published ā¢ 27 -
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Paper ā¢ 2406.02657 ā¢ Published ā¢ 36
Collections
Discover the best community collections!
Collections including paper arxiv:2405.17405
-
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer
Paper ā¢ 2405.17405 ā¢ Published ā¢ 14 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper ā¢ 2405.18991 ā¢ Published ā¢ 12 -
4Diffusion: Multi-view Video Diffusion Model for 4D Generation
Paper ā¢ 2405.20674 ā¢ Published ā¢ 11 -
4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models
Paper ā¢ 2406.07472 ā¢ Published ā¢ 10
-
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Paper ā¢ 2403.01807 ā¢ Published ā¢ 7 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper ā¢ 2403.02151 ā¢ Published ā¢ 12 -
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Paper ā¢ 2403.01779 ā¢ Published ā¢ 28 -
MagicClay: Sculpting Meshes With Generative Neural Fields
Paper ā¢ 2403.02460 ā¢ Published ā¢ 6
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ā¢ 2402.17485 ā¢ Published ā¢ 188 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ā¢ 2312.01841 ā¢ Published ā¢ 1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ā¢ 2311.16498 ā¢ Published ā¢ 1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ā¢ 2312.02134 ā¢ Published ā¢ 2
-
Seamless Human Motion Composition with Blended Positional Encodings
Paper ā¢ 2402.15509 ā¢ Published ā¢ 14 -
TripoSR: Fast 3D Object Reconstruction from a Single Image
Paper ā¢ 2403.02151 ā¢ Published ā¢ 12 -
3D-VLA: A 3D Vision-Language-Action Generative World Model
Paper ā¢ 2403.09631 ā¢ Published ā¢ 7 -
Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting
Paper ā¢ 2403.09981 ā¢ Published ā¢ 6
-
MM-LLMs: Recent Advances in MultiModal Large Language Models
Paper ā¢ 2401.13601 ā¢ Published ā¢ 44 -
A Touch, Vision, and Language Dataset for Multimodal Alignment
Paper ā¢ 2402.13232 ā¢ Published ā¢ 13 -
Neural Network Diffusion
Paper ā¢ 2402.13144 ā¢ Published ā¢ 94 -
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Paper ā¢ 2402.13251 ā¢ Published ā¢ 13
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper ā¢ 2401.09985 ā¢ Published ā¢ 14 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper ā¢ 2401.09962 ā¢ Published ā¢ 7 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper ā¢ 2401.10404 ā¢ Published ā¢ 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper ā¢ 2401.10822 ā¢ Published ā¢ 13
-
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors
Paper ā¢ 2312.16837 ā¢ Published ā¢ 5 -
Learning the 3D Fauna of the Web
Paper ā¢ 2401.02400 ā¢ Published ā¢ 9 -
Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model
Paper ā¢ 2310.15110 ā¢ Published ā¢ 2 -
Zero-1-to-3: Zero-shot One Image to 3D Object
Paper ā¢ 2303.11328 ā¢ Published ā¢ 5
-
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning
Paper ā¢ 2306.07967 ā¢ Published ā¢ 24 -
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Paper ā¢ 2306.07954 ā¢ Published ā¢ 113 -
TryOnDiffusion: A Tale of Two UNets
Paper ā¢ 2306.08276 ā¢ Published ā¢ 72 -
Seeing the World through Your Eyes
Paper ā¢ 2306.09348 ā¢ Published ā¢ 32