parler-tts-streaming

Sleeping

App Files Files Community

sanchit-gandhi commited on Apr 24, 2024

Commit

1700872

1 Parent(s): 6f5cea7

description

Browse files

Files changed (1) hide show

app.py +9 -6

app.py CHANGED Viewed

@@ -57,8 +57,8 @@ examples = [
 jenny_examples = [
     [
-        "Remember - this is only the first iteration of the model! To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data by a factor of five times.",
-        "Jenny speaks at a fast pace in a small, confined space with a very clear audio and an animated tone.",
         2.5,
     ],
     [
@@ -73,7 +73,7 @@ jenny_examples = [
     ],
     [
         "Montrose also, after having experienced still more variety of good and bad fortune, threw down his arms, and retired out of the kingdom.",
-        "Jenny delivers words at a fast pace and an animated tone, in a very spacious environment, accompanied by noticeable background noise.",
         2.5,
     ],
 ]
@@ -323,15 +323,18 @@ with gr.Blocks(css=css) as block:
     gr.HTML(
         f"""
         <p><a href="https://github.com/huggingface/parler-tts"> Parler-TTS</a> is a training and inference library for
-        high-fidelity text-to-speech (TTS) models. The model demonstrated here, <a href="https://huggingface.co/parler-tts/parler_tts_mini_v0.1"> Parler-TTS Mini v0.1</a>,
-        is the first iteration model trained using 10k hours of narrated audiobooks. It generates high-quality speech
-        with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).</p>
         <p>Tips for ensuring good generation:
         <ul>
             <li>Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise</li>
             <li>Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech</li>
             <li>The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt</li>
         </ul>
         </p>
         """

 jenny_examples = [
     [
+        "Remember, this is only the first iteration of the model! To improve the prosody and naturalness of the speech further, we're scaling up the amount of training data by a factor of five times.",
+        "Jenny speaks at an average pace with a slightly animated delivery in a very confined sounding environment with clear audio quality.",
         2.5,
     ],
     [
     ],
     [
         "Montrose also, after having experienced still more variety of good and bad fortune, threw down his arms, and retired out of the kingdom.",
+        "Jenny delivers her words at a fast pace and an animated tone, in a very spacious environment, accompanied by noticeable background noise.",
         2.5,
     ],
 ]
     gr.HTML(
         f"""
         <p><a href="https://github.com/huggingface/parler-tts"> Parler-TTS</a> is a training and inference library for
+        high-fidelity text-to-speech (TTS) models. Two models are demonstrated here, <a href="https://huggingface.co/parler-tts/parler_tts_mini_v0.1"> Parler-TTS Mini v0.1</a>,
+        is the first iteration model trained using 10k hours of narrated audiobooks, and <a href="https://huggingface.co/ylacombe/parler-tts-mini-jenny-30H"> Parler-TTS Jenny</a>,
+        a model fine-tuned on the <a href="https://huggingface.co/datasets/reach-vb/jenny_tts_dataset"> Jenny dataset</a>.</p>
+        <p>Both models generates high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation).</p>
         <p>Tips for ensuring good generation:
         <ul>
             <li>Include the term "very clear audio" to generate the highest quality audio, and "very noisy audio" for high levels of background noise</li>
             <li>Punctuation can be used to control the prosody of the generations, e.g. use commas to add small breaks in speech</li>
             <li>The remaining speech features (gender, speaking rate, pitch and reverberation) can be controlled directly through the prompt</li>
+            <li>Include the term "Jenny" when using the fine-tuned Jenny model to pick out her voice</li>
         </ul>
         </p>
         """