EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published Sep 17 • 18
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model Paper • 2406.04904 • Published Jun 7 • 4
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis Paper • 2404.19622 • Published Apr 30 • 2
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm Paper • 2403.11781 • Published Mar 18 • 17
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis Paper • 2312.03491 • Published Dec 6, 2023 • 33
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis Paper • 2306.09417 • Published Jun 15, 2023 • 3
Matcha-TTS: A fast TTS architecture with conditional flow matching Paper • 2309.03199 • Published Sep 6, 2023 • 11