CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models Paper • 2410.13267 • Published Oct 17, 2024 • 1
Memories are One-to-Many Mapping Alleviators in Talking Face Generation Paper • 2212.05005 • Published Dec 9, 2022
DAE-Talker: High Fidelity Speech-Driven Talking Face Generation with Diffusion Autoencoder Paper • 2303.17550 • Published Mar 30, 2023
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper • 2502.04128 • Published Feb 6 • 25
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling Paper • 2406.04321 • Published Jun 6, 2024
The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation Paper • 2503.04606 • Published 7 days ago • 7
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey Paper • 2412.18619 • Published Dec 16, 2024 • 55
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior Paper • 2106.06406 • Published Jun 11, 2021
MuPT: A Generative Symbolic Music Pretrained Transformer Paper • 2404.06393 • Published Apr 9, 2024 • 16
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models Paper • 2406.01375 • Published Jun 3, 2024
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation Paper • 2405.15758 • Published May 24, 2024 • 1
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published Jun 26, 2024 • 23
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms Paper • 2406.14228 • Published Jun 20, 2024 • 1
Make Your Actor Talk: Generalizable and High-Fidelity Lip Sync with Motion and Appearance Disentanglement Paper • 2406.08096 • Published Jun 12, 2024
PromptTTS: Controllable Text-to-Speech with Text Descriptions Paper • 2211.12171 • Published Nov 22, 2022
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS Paper • 2406.18009 • Published Jun 26, 2024 • 23
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers Paper • 2406.05370 • Published Jun 8, 2024 • 19
Mitigating Reversal Curse in Large Language Models via Semantic-aware Permutation Training Paper • 2403.00758 • Published Mar 1, 2024 • 2