Plachta/VALL-E-X · Apply for community grant: Academic project (gpu)

Hi Dear HF Team! 😃

This is an open-source implementation of Microsoft's latest Text-to-speech model VALL-E X 🎙️, from paper Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling 🌐. It is basically a 24 layers, 1024 d GPT-style model 💻. I have made a demo page about this model for your inspection and consideration 📋.

It takes about 60s to synthesize a 6s speech on free CPU ⏳, but only 2~3s on a single RTX 3060 ⚡. I sincerely hope that you could grant GPU resources for the Hugging Face space of this project, so that more people can have the chance to play with this awesome model 🚀.

Best Regards!🤗