Now in 5 languages!
Generate realistic talking heads from image+audio
FitDiT is a high-fidelity virtual try-on model.
https://huggingface.co/papers/2501.03006
Audio Conditioned LipSync with Latent Diffusion Models
InstantID-XS