Text-to-Speech

larger models

#1
by ctranslate2-4you - opened

Any plans to be able to utilize larger models, like large-v2...or quantized models like those implemented by ctranslate2/faster-whisper/whisperx and so on...Or perhaps it already exists and I just missed it?

WhisperSpeech org

Hey, we did not see performance to improve a lot when moving to larger models. We'll probably revisit this once we have more conditioning options (emotions, prosody, etc.).

For CTranslate2/fast-whisper we'd love to have our models running there but we did not have the resources to do it ourselves. For now we rely on torch.compile to improve inference performance.

Sign up or log in to comment