Update demo.py
Browse files
demo.py
CHANGED
@@ -317,11 +317,13 @@ with gr.Blocks() as longform:
|
|
317 |
outputs=[audio_longform],
|
318 |
concurrency_limit=4)
|
319 |
|
320 |
-
# --- User Guide / Info Tab (Reformatted User Text) ---
|
321 |
-
# Convert Markdown-like text to basic HTML for styling
|
322 |
user_guide_html = f"""
|
323 |
<div style="background-color: rgba(30, 30, 30, 0.9); color: #f0f0f0; padding: 20px; border-radius: 10px; border: 1px solid #444;">
|
324 |
<h2 style="border-bottom: 1px solid #555; padding-bottom: 5px;">Quick Notes:</h2>
|
|
|
|
|
|
|
|
|
325 |
<p>Everything in this demo & the repo (coming soon) is experimental. The main idea is just playing around with different things to see what works when you're limited to training on a pair of RTX 3090s.</p>
|
326 |
<p>The data used for the english model is rough and pretty tough for any TTS model (think debates, real conversations, plus a little bit of cleaner professional performances). It mostly comes from public sources or third parties (no TOS signed). I'll probably write a blog post later with more details.</p>
|
327 |
<p>So far I focused on English and Russian, more can be covered.</p>
|
|
|
317 |
outputs=[audio_longform],
|
318 |
concurrency_limit=4)
|
319 |
|
|
|
|
|
320 |
user_guide_html = f"""
|
321 |
<div style="background-color: rgba(30, 30, 30, 0.9); color: #f0f0f0; padding: 20px; border-radius: 10px; border: 1px solid #444;">
|
322 |
<h2 style="border-bottom: 1px solid #555; padding-bottom: 5px;">Quick Notes:</h2>
|
323 |
+
|
324 |
+
<p> This is run on a single RTX 3090. </p>
|
325 |
+
<p> These networks can only generate natural speech with correct intonations (i.e generating NSFW, non-speech sounds, stutters etc. doesn't work.) </p>
|
326 |
+
<p> I will gradually update here and -> [Github](https://github.com/Respaired/Project_Kalliope) </p>
|
327 |
<p>Everything in this demo & the repo (coming soon) is experimental. The main idea is just playing around with different things to see what works when you're limited to training on a pair of RTX 3090s.</p>
|
328 |
<p>The data used for the english model is rough and pretty tough for any TTS model (think debates, real conversations, plus a little bit of cleaner professional performances). It mostly comes from public sources or third parties (no TOS signed). I'll probably write a blog post later with more details.</p>
|
329 |
<p>So far I focused on English and Russian, more can be covered.</p>
|