metadata

title: StyleTTS2 Studio
emoji: 🔥
colorFrom: purple
colorTo: yellow
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: mit
short_description: Build custom voices in StyleTTS 2

StyleTTS2 Studio

Customizable Voices for the StyleTTS2 text-to-speech model based on StyleTTS2 and artificial StyleTTS2.

I used Label Studio to label 50 randomly generated voices with the following 6 features: Gender, Tone, Quality, Pace, Enunciation and Style. These 6 features were used for a Principal Component Analysis (PCA) to reduce the 256-dimensional style vector into manageable dimensions. The results can likely be further enhanced by selecting better features and labeling more samples, but these first results show that it's generally possible to dial in specific voice features.

Disclaimer from the original StyleTTS2 repo:

Pre-Trained Models: Before using these pre-trained models, you agree to inform the listeners that the speech samples are synthesized by the pre-trained models, unless you have the permission to use the voice you synthesize. That is, you agree to only use voices whose speakers grant the permission to have their voice cloned, either directly or by license before making synthesized voices public, or you have to publicly announce that these voices are synthesized if you do not have the permission to use these voices.