https://huggingface.co/papers/2501.03006
Audio Conditioned LipSync with Latent Diffusion Models
Create top-quality 3D(.GLB) models from text or images
FLUX 3D StyleGEN
Create videos with FFMPEG + Qwen2.5-Coder
Text to Audio (Sound SFX) Generator
Restylize & repose person ID