Fast image relighting using Latent Bridge Matching
Conversational speech generation
Convert voice to match another using reference audio
A text-to-speech model powered by SparkAudio and Mobvoi.
Blazingly Fast and Embarrassingly Simple Song Generation
Generate images with virtual try-on or pose transfer
Large Language Diffusion Models