The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...
•
29
https://github.com/nivibilla/local-llasa-tts/blob/main/llasa_vllm_longtext_inference.ipynb
Vllm code up.
Here is a sample of batched chunked inference it's quite good
It replicates what's given in the sample audio. So if your sample audio is sad the output will be sad
Yeah the authors said 8b by end of month and paper not sure. I havent heard of glm4voice tbf. ill check it out
im making some sample notebooks. keep an eye on this repo. But yeah xcodec2 has very strict reqs, what i do is install that first and then vllm. it works fine