Controllable Text Generation for Large Language Models: A Survey Paper • 2408.12599 • Published Aug 22 • 63
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22 • 35
Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published Aug 22 • 15
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20 • 58
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning Paper • 2408.11001 • Published Aug 20 • 11
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published Aug 26 • 60
Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation Paper • 2408.15991 • Published Aug 28 • 15
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper • 2408.16767 • Published Aug 29 • 30
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution Paper • 2310.16834 • Published Oct 25, 2023 • 4
VisionTS: Visual Masked Autoencoders Are Free-Lunch Zero-Shot Time Series Forecasters Paper • 2408.17253 • Published Aug 30 • 37
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos Paper • 2409.02095 • Published Sep 3 • 35
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper • 2409.02634 • Published Sep 4 • 90
FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation Paper • 2409.02245 • Published Sep 3 • 9
Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing Paper • 2409.01322 • Published Sep 2 • 94
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation Paper • 2409.03718 • Published Sep 5 • 25
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published Sep 6 • 18
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis Paper • 2409.06135 • Published Sep 10 • 14
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Paper • 2409.07452 • Published Sep 11 • 20
Instant Facial Gaussians Translator for Relightable and Interactable Facial Rendering Paper • 2409.07441 • Published Sep 11 • 10
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published Sep 12 • 18
DreamHOI: Subject-Driven Generation of 3D Human-Object Interactions with Diffusion Priors Paper • 2409.08278 • Published Sep 12 • 10
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally Paper • 2409.08270 • Published Sep 12 • 9
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published Sep 12 • 10
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published Sep 13 • 31
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published Sep 13 • 11
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published Sep 13 • 17
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 49
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published Sep 17 • 25
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published Sep 17 • 28
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published Sep 17 • 13
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17 • 18
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction Paper • 2409.11211 • Published Sep 17 • 8
Single-Layer Learnable Activation for Implicit Neural Representation (SL^{2}A-INR) Paper • 2409.10836 • Published Sep 17 • 4
Implicit Neural Representations with Fourier Kolmogorov-Arnold Networks Paper • 2409.09323 • Published Sep 14 • 5
Towards Diverse and Efficient Audio Captioning via Diffusion Models Paper • 2409.09401 • Published Sep 14 • 6
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published Sep 19 • 23
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published Sep 19 • 18
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published Sep 19 • 5
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published Sep 19 • 5
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published Sep 19 • 11
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published Sep 24 • 32
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions Paper • 2409.15278 • Published Sep 23 • 22
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors Paper • 2409.15273 • Published Sep 23 • 10
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting Paper • 2409.14393 • Published Sep 22 • 8
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending Paper • 2409.13926 • Published Sep 20 • 5
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published Sep 20 • 15
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published Sep 20 • 12
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians Paper • 2409.13648 • Published Sep 20 • 9
Reflecting Reality: Enabling Diffusion Models to Produce Faithful Mirror Reflections Paper • 2409.14677 • Published Sep 23 • 14
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction Paper • 2409.18124 • Published Sep 26 • 32
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image Paper • 2409.17280 • Published Sep 25 • 9
DreamWaltz-G: Expressive 3D Gaussian Avatars from Skeleton-Guided 2D Diffusion Paper • 2409.17145 • Published Sep 25 • 13
Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors Paper • 2409.17058 • Published Sep 25 • 11
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27 • 25
Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration Paper • 2410.00418 • Published Oct 1 • 9
SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs Paper • 2410.00337 • Published Oct 1 • 10
DressRecon: Freeform 4D Human Reconstruction from Monocular Video Paper • 2409.20563 • Published Sep 30 • 7
Flex3D: Feed-Forward 3D Generation With Flexible Reconstruction Model And Input View Curation Paper • 2410.00890 • Published Oct 1 • 18
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published Oct 1 • 144
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published Oct 2 • 16
3DGS-DET: Empower 3D Gaussian Splatting with Boundary Guidance and Box-Focused Sampling for 3D Object Detection Paper • 2410.01647 • Published Oct 2 • 28
HarmoniCa: Harmonizing Training and Inference for Better Feature Cache in Diffusion Transformer Acceleration Paper • 2410.01723 • Published Oct 2 • 4
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Paper • 2410.02416 • Published Oct 3 • 26
PHI-S: Distribution Balancing for Label-Free Multi-Teacher Distillation Paper • 2410.01680 • Published Oct 2 • 32
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Paper • 2410.00316 • Published Oct 1 • 4
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published Oct 6 • 28
Presto! Distilling Steps and Layers for Accelerating Music Generation Paper • 2410.05167 • Published Oct 7 • 15
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Paper • 2410.04932 • Published Oct 7 • 9
RoCoTex: A Robust Method for Consistent Texture Synthesis with Diffusion Models Paper • 2409.19989 • Published Sep 30 • 17
Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach Paper • 2410.03160 • Published Oct 4 • 4
SePPO: Semi-Policy Preference Optimization for Diffusion Alignment Paper • 2410.05255 • Published Oct 7 • 4
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation Paper • 2410.07171 • Published Oct 9 • 41
Pyramidal Flow Matching for Efficient Video Generative Modeling Paper • 2410.05954 • Published Oct 8 • 38
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation Paper • 2410.05591 • Published Oct 8 • 13
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning Paper • 2410.05664 • Published Oct 8 • 7
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published Oct 9 • 42
DICE: Discrete Inversion Enabling Controllable Editing for Multinomial Diffusion and Masked Generative Models Paper • 2410.08207 • Published Oct 10 • 18
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation Paper • 2410.09009 • Published Oct 11 • 13
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation Paper • 2410.08159 • Published Oct 10 • 25
Rectified Diffusion: Straightness Is Not Your Need in Rectified Flow Paper • 2410.07303 • Published Oct 9 • 18
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler Paper • 2410.05651 • Published Oct 8 • 13
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published Oct 14 • 54
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14 • 25
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published Oct 14 • 29
Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies Paper • 2410.10803 • Published Oct 14 • 6
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published Oct 15 • 16
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26 • 23
Continuous Speech Synthesis using per-token Latent Diffusion Paper • 2410.16048 • Published Oct 21 • 29
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25 • 23
Scaling Diffusion Language Models via Adaptation from Autoregressive Models Paper • 2410.17891 • Published Oct 23 • 15
Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion Paper • 2410.13674 • Published Oct 17 • 15
DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion Paper • 2411.04928 • Published Nov 7 • 48
SVDQunat: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models Paper • 2411.05007 • Published Nov 7 • 16
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation Paper • 2411.04989 • Published Nov 7 • 14
Controlling Language and Diffusion Models by Transporting Activations Paper • 2410.23054 • Published Oct 30 • 16
DreamPolish: Domain Score Distillation With Progressive Geometry Generation Paper • 2411.01602 • Published Nov 3 • 10
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D Paper • 2411.02336 • Published Nov 4 • 23
Scaling Properties of Diffusion Models for Perceptual Tasks Paper • 2411.08034 • Published Nov 12 • 13
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings Paper • 2411.08017 • Published Nov 12 • 11
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models Paper • 2411.07232 • Published Nov 11 • 62
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11 • 28
GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation Paper • 2411.08033 • Published Nov 12 • 22
Stylecodes: Encoding Stylistic Information For Image Generation Paper • 2411.12811 • Published Nov 19 • 11
Material Anything: Generating Materials for Any 3D Object via Diffusion Paper • 2411.15138 • Published Nov 22 • 42
DreamRunner: Fine-Grained Storytelling Video Generation with Retrieval-Augmented Motion Adaptation Paper • 2411.16657 • Published 30 days ago • 17
OminiControl: Minimal and Universal Control for Diffusion Transformer Paper • 2411.15098 • Published Nov 22 • 53
CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models Paper • 2411.18613 • Published 28 days ago • 50
Identity-Preserving Text-to-Video Generation by Frequency Decomposition Paper • 2411.17440 • Published 29 days ago • 35
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving Paper • 2411.15139 • Published Nov 22 • 15
Diffusion Self-Distillation for Zero-Shot Customized Image Generation Paper • 2411.18616 • Published 28 days ago • 15
Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis Paper • 2411.17769 • Published 29 days ago • 7