Text-to-Speech
coqui

how to adjust the speed in synthesize

#1
by s8600863 - opened

how to adjust the speed in synthesize when using the model directly, thanks

Coqui.ai org

good question I think we forgot to implement that :)

Coqui.ai org

next release will be with the fix

i see, thanks

Great to see this feature in the future

Coqui.ai org

we released the speed adjustment in 🐸TTS

erogol changed discussion status to closed

@erogol where are the docs on how to adjust speed in xtts?

Anything new?

For anyone wondering how to set the model speed, as this appears missing from their documentation. You need to load the model directly as so.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="model/", eval=True)
#model.cuda()

outputs = model.synthesize(
    "This is Stefon Alfaro, I really said this. The sky is blue. Computers are good. Test 1 2 3 4.",
    config,
    speaker_wav="StefonNewMicSample.wav",
    gpt_cond_len=3,
    language="en",
    speed=1.5
)

#print(outputs)

# Extract the audio waveform from the 'wav' key.
raw_audio = outputs['wav']
# Use a predefined or configured sample rate. You might need to adjust this value.
sample_rate = 24000  # This is a common sample rate for TTS models, but check your model's configuration.

# Define the path where you want to save the audio file.
output_path = 'output2.wav'

# Save the audio data to a WAV file.
sf.write(output_path, raw_audio, sample_rate)

For anyone wondering how to set the model speed, as this appears missing from their documentation. You need to load the model directly as so.

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
import soundfile as sf

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="model/", eval=True)
#model.cuda()

outputs = model.synthesize(
    "This is Stefon Alfaro, I really said this. The sky is blue. Computers are good. Test 1 2 3 4.",
    config,
    speaker_wav="StefonNewMicSample.wav",
    gpt_cond_len=3,
    language="en",
    speed=1.5
)

#print(outputs)

# Extract the audio waveform from the 'wav' key.
raw_audio = outputs['wav']
# Use a predefined or configured sample rate. You might need to adjust this value.
sample_rate = 24000  # This is a common sample rate for TTS models, but check your model's configuration.

# Define the path where you want to save the audio file.
output_path = 'output2.wav'

# Save the audio data to a WAV file.
sf.write(output_path, raw_audio, sample_rate)

this speed parameter only have impact on coqui studio models. You can see the information in python function describe.

Sign up or log in to comment