how to adjust the speed in synthesize
#1
by
s8600863
- opened
how to adjust the speed in synthesize when using the model directly, thanks
good question I think we forgot to implement that :)
next release will be with the fix
i see, thanks
Great to see this feature in the future
we released the speed adjustment in 🐸TTS
erogol
changed discussion status to
closed
Anything new?
For anyone wondering how to set the model speed, as this appears missing from their documentation. You need to load the model directly as so.
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf
config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="model/", eval=True)
#model.cuda()
outputs = model.synthesize(
"This is Stefon Alfaro, I really said this. The sky is blue. Computers are good. Test 1 2 3 4.",
config,
speaker_wav="StefonNewMicSample.wav",
gpt_cond_len=3,
language="en",
speed=1.5
)
#print(outputs)
# Extract the audio waveform from the 'wav' key.
raw_audio = outputs['wav']
# Use a predefined or configured sample rate. You might need to adjust this value.
sample_rate = 24000 # This is a common sample rate for TTS models, but check your model's configuration.
# Define the path where you want to save the audio file.
output_path = 'output2.wav'
# Save the audio data to a WAV file.
sf.write(output_path, raw_audio, sample_rate)
For anyone wondering how to set the model speed, as this appears missing from their documentation. You need to load the model directly as so.
from TTS.tts.configs.xtts_config import XttsConfig from TTS.tts.models.xtts import Xtts import soundfile as sf config = XttsConfig() config.load_json("config.json") model = Xtts.init_from_config(config) model.load_checkpoint(config, checkpoint_dir="model/", eval=True) #model.cuda() outputs = model.synthesize( "This is Stefon Alfaro, I really said this. The sky is blue. Computers are good. Test 1 2 3 4.", config, speaker_wav="StefonNewMicSample.wav", gpt_cond_len=3, language="en", speed=1.5 ) #print(outputs) # Extract the audio waveform from the 'wav' key. raw_audio = outputs['wav'] # Use a predefined or configured sample rate. You might need to adjust this value. sample_rate = 24000 # This is a common sample rate for TTS models, but check your model's configuration. # Define the path where you want to save the audio file. output_path = 'output2.wav' # Save the audio data to a WAV file. sf.write(output_path, raw_audio, sample_rate)
this speed parameter only have impact on coqui studio models. You can see the information in python function describe.