Introduction 🐢

TorToiSe is a text-to-speech program built in April 2022 by jbetker@. TorToiSe is open source, with trained model weights available at https://github.com/neonbjb/tortoise-tts

This page demonstrates some of the results of TorToiSe.

Handpicked results 🐢

Following are several particularly good results generated by the model.

Short-form

Compared to Tacotron2 (with the LJSpeech voice): 🐢

LJSpeech is a popular dataset used to train small-scale TTS models. TorToiSe is a multi-voice model, following is how it renders the LJSpeech voice with no fine-tuning, compared with results for the same text from the popular Tacotron2 model paired with the Waveglow transformer:

Tacotron2+Waveglow	TorToiSe	TorToiSe Finetuned

All Results 🐢

Following are all the results from which the hand-picked results were drawn from. Also included is the reference audio that the program is trying to mimic. This will give you a better sense of how TorToiSe really performs.

Short-form

text	angie	daniel	deniro	emma	freeman	geralt	halle	jlaw	lj	myself	pat	snakes	tom	train_atkins	train_dotrice	train_kennard	weaver	william
reference clip
autoregressive_ml
bengio_it_needs_to_know_what_is_bad
dickinson_stop_for_death
espn_basketball
frost_oar_to_oar
frost_road_not_taken
gatsby_and_so_we_beat_on
harrypotter_differences_of_habit_and_language
i_am_a_language_model
melodie_kao
nyt_covid
real_courage_is_when_you_know_your_licked
rolling_stone_review
spacecraft_interview
tacotron2_sample1
tacotron2_sample2
tacotron2_sample3
tacotron2_sample4
watts_this_is_the_real_secret_of_life
wilde_nowadays_people_know_the_price

Long-form

Angelina:

Craig:

Deniro:

Emma:

Freeman:

Geralt:

Halle:

Jlaw:

LJ:

Myself:

Pat:

Snakes:

Tom:

Weaver:

William:

Prompt Engineering 🐢

Tortoise is capable of "prompt-engineering" in that tone and prosody is affected by the emotions inflected in the words fed to the program. For example, prompting the model with "[I am so angry,] I went to the park and threw a ball" will result in it outputting "I went to the park and threw the ball" with an angry tone.

Following are a few examples of different prompts. The effect is subtle, but is definitely there. Many voices are less effected by this.

Angry:

Sad:

Happy:

Scared: