Sprt98 commited on
Commit
fb73b5d
·
verified ·
1 Parent(s): ab87236

Upload 2 files

Browse files
Files changed (2) hide show
  1. README.md +8 -46
  2. app.py +551 -0
README.md CHANGED
@@ -1,46 +1,8 @@
1
- <div align="center">
2
-
3
- <img alt="LOGO" src="https://avatars.githubusercontent.com/u/122017386" width="256" height="256" />
4
-
5
- # Bert-VITS2
6
-
7
- VITS2 Backbone with multilingual bert
8
-
9
- For quick guide, please refer to `webui_preprocess.py`.
10
-
11
- 简易教程请参见 `webui_preprocess.py`。
12
-
13
- ## 【项目推介】
14
- # FishAudio下的全新自回归TTS [Fish-Speech](https://github.com/fishaudio/fish-speech)现已可用,效果为目前开源SOTA水准,且在持续维护,推荐使用该项目作为BV2/GSV的替代。本项目短期内不再进行维护。
15
- ## Demo Video: https://www.bilibili.com/video/BV18E421371Q
16
- ## Tech slides Video: https://www.bilibili.com/video/BV1zJ4m1K7cj
17
- ## 请注意,本项目核心思路来源于[anyvoiceai/MassTTS](https://github.com/anyvoiceai/MassTTS) 一个非常好的tts项目
18
- ## MassTTS的演示demo为[ai版峰哥锐评峰哥本人,并找回了在金三角失落的腰子](https://www.bilibili.com/video/BV1w24y1c7z9)
19
-
20
- [//]: # (## 本项目与[PlayVoice/vits_chinese]&#40;https://github.com/PlayVoice/vits_chinese&#41; 没有任何关系)
21
-
22
- [//]: # ()
23
- [//]: # (本仓库来源于之前朋友分享了ai峰哥的视频,本人被其中的效果惊艳,在自己尝试MassTTS以后发现fs在音质方面与vits有一定差距,并且training的pipeline比vits更复杂,因此按照其思路将bert)
24
-
25
- ## 成熟的旅行者/开拓者/舰长/博士/sensei/猎魔人/喵喵露/V应当参阅代码自己学习如何训练。
26
-
27
- ### 严禁将此项目用于一切违反《中华人民共和国宪法》,《中华人民共和国刑法》,《中华人民共和国治安管理处罚法》和《中华人民共和国民法典》之用途。
28
- ### 严禁用于任何政治相关用途。
29
- #### Video:https://www.bilibili.com/video/BV1hp4y1K78E
30
- #### Demo:https://www.bilibili.com/video/BV1TF411k78w
31
- #### QQ Group:815818430
32
- ## References
33
- + [anyvoiceai/MassTTS](https://github.com/anyvoiceai/MassTTS)
34
- + [jaywalnut310/vits](https://github.com/jaywalnut310/vits)
35
- + [p0p4k/vits2_pytorch](https://github.com/p0p4k/vits2_pytorch)
36
- + [svc-develop-team/so-vits-svc](https://github.com/svc-develop-team/so-vits-svc)
37
- + [PaddlePaddle/PaddleSpeech](https://github.com/PaddlePaddle/PaddleSpeech)
38
- + [emotional-vits](https://github.com/innnky/emotional-vits)
39
- + [fish-speech](https://github.com/fishaudio/fish-speech)
40
- + [Bert-VITS2-UI](https://github.com/jiangyuxiaoxiao/Bert-VITS2-UI)
41
- ## 感谢所有贡献者作出的努力
42
- <a href="https://github.com/fishaudio/Bert-VITS2/graphs/contributors" target="_blank">
43
- <img src="https://contrib.rocks/image?repo=fishaudio/Bert-VITS2"/>
44
- </a>
45
-
46
- [//]: # (# 本项目所有代码引用均已写明,bert部分代码思路来源于[AI峰哥]&#40;https://www.bilibili.com/video/BV1w24y1c7z9&#41;,与[vits_chinese]&#40;https://github.com/PlayVoice/vits_chinese&#41;无任何关系。欢迎各位查阅代码。同时,我们也对该开发者的[碰瓷,乃至开盒开发者的行为]&#40;https://www.bilibili.com/read/cv27101514/&#41;表示强烈谴责。)
 
1
+ ---
2
+ license: mit
3
+ title: 崩坏:星穹铁道-AI知更鸟
4
+ sdk: gradio
5
+ emoji: 🌍
6
+ colorFrom: purple
7
+ colorTo: purple
8
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py ADDED
@@ -0,0 +1,551 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # flake8: noqa: E402
2
+ import os
3
+ import logging
4
+ import re_matching
5
+ from tools.sentence import split_by_language
6
+
7
+ logging.getLogger("numba").setLevel(logging.WARNING)
8
+ logging.getLogger("markdown_it").setLevel(logging.WARNING)
9
+ logging.getLogger("urllib3").setLevel(logging.WARNING)
10
+ logging.getLogger("matplotlib").setLevel(logging.WARNING)
11
+
12
+ logging.basicConfig(
13
+ level=logging.INFO, format="| %(name)s | %(levelname)s | %(message)s"
14
+ )
15
+
16
+ logger = logging.getLogger(__name__)
17
+
18
+ import torch
19
+ import ssl
20
+ ssl._create_default_https_context = ssl._create_unverified_context
21
+ import nltk
22
+ nltk.download('cmudict')
23
+ import utils
24
+ from infer import infer, latest_version, get_net_g, infer_multilang
25
+ import gradio as gr
26
+ import webbrowser
27
+ import numpy as np
28
+ from config import config
29
+ from tools.translate import translate
30
+ import librosa
31
+
32
+ net_g = None
33
+
34
+ device = config.webui_config.device
35
+ if device == "mps":
36
+ os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1"
37
+
38
+
39
+ def generate_audio(
40
+ slices,
41
+ sdp_ratio,
42
+ noise_scale,
43
+ noise_scale_w,
44
+ length_scale,
45
+ speaker,
46
+ language,
47
+ reference_audio,
48
+ emotion,
49
+ style_text,
50
+ style_weight,
51
+ skip_start=False,
52
+ skip_end=False,
53
+ ):
54
+ audio_list = []
55
+ # silence = np.zeros(hps.data.sampling_rate // 2, dtype=np.int16)
56
+ with torch.no_grad():
57
+ for idx, piece in enumerate(slices):
58
+ skip_start = idx != 0
59
+ skip_end = idx != len(slices) - 1
60
+ audio = infer(
61
+ piece,
62
+ reference_audio=reference_audio,
63
+ emotion=emotion,
64
+ sdp_ratio=sdp_ratio,
65
+ noise_scale=noise_scale,
66
+ noise_scale_w=noise_scale_w,
67
+ length_scale=length_scale,
68
+ sid=speaker,
69
+ language=language,
70
+ hps=hps,
71
+ net_g=net_g,
72
+ device=device,
73
+ skip_start=skip_start,
74
+ skip_end=skip_end,
75
+ style_text=style_text,
76
+ style_weight=style_weight,
77
+ )
78
+ audio16bit = gr.processing_utils.convert_to_16_bit_wav(audio)
79
+ audio_list.append(audio16bit)
80
+ return audio_list
81
+
82
+
83
+ def generate_audio_multilang(
84
+ slices,
85
+ sdp_ratio,
86
+ noise_scale,
87
+ noise_scale_w,
88
+ length_scale,
89
+ speaker,
90
+ language,
91
+ reference_audio,
92
+ emotion,
93
+ skip_start=False,
94
+ skip_end=False,
95
+ ):
96
+ audio_list = []
97
+ # silence = np.zeros(hps.data.sampling_rate // 2, dtype=np.int16)
98
+ with torch.no_grad():
99
+ for idx, piece in enumerate(slices):
100
+ skip_start = idx != 0
101
+ skip_end = idx != len(slices) - 1
102
+ audio = infer_multilang(
103
+ piece,
104
+ reference_audio=reference_audio,
105
+ emotion=emotion,
106
+ sdp_ratio=sdp_ratio,
107
+ noise_scale=noise_scale,
108
+ noise_scale_w=noise_scale_w,
109
+ length_scale=length_scale,
110
+ sid=speaker,
111
+ language=language[idx],
112
+ hps=hps,
113
+ net_g=net_g,
114
+ device=device,
115
+ skip_start=skip_start,
116
+ skip_end=skip_end,
117
+ )
118
+ audio16bit = gr.processing_utils.convert_to_16_bit_wav(audio)
119
+ audio_list.append(audio16bit)
120
+ return audio_list
121
+
122
+
123
+ def tts_split(
124
+ text: str,
125
+ speaker,
126
+ sdp_ratio,
127
+ noise_scale,
128
+ noise_scale_w,
129
+ length_scale,
130
+ language,
131
+ cut_by_sent,
132
+ interval_between_para,
133
+ interval_between_sent,
134
+ reference_audio,
135
+ emotion,
136
+ style_text,
137
+ style_weight,
138
+ ):
139
+ while text.find("\n\n") != -1:
140
+ text = text.replace("\n\n", "\n")
141
+ text = text.replace("|", "")
142
+ para_list = re_matching.cut_para(text)
143
+ para_list = [p for p in para_list if p != ""]
144
+ audio_list = []
145
+ for p in para_list:
146
+ if not cut_by_sent:
147
+ audio_list += process_text(
148
+ p,
149
+ speaker,
150
+ sdp_ratio,
151
+ noise_scale,
152
+ noise_scale_w,
153
+ length_scale,
154
+ language,
155
+ reference_audio,
156
+ emotion,
157
+ style_text,
158
+ style_weight,
159
+ )
160
+ silence = np.zeros((int)(44100 * interval_between_para), dtype=np.int16)
161
+ audio_list.append(silence)
162
+ else:
163
+ audio_list_sent = []
164
+ sent_list = re_matching.cut_sent(p)
165
+ sent_list = [s for s in sent_list if s != ""]
166
+ for s in sent_list:
167
+ audio_list_sent += process_text(
168
+ s,
169
+ speaker,
170
+ sdp_ratio,
171
+ noise_scale,
172
+ noise_scale_w,
173
+ length_scale,
174
+ language,
175
+ reference_audio,
176
+ emotion,
177
+ style_text,
178
+ style_weight,
179
+ )
180
+ silence = np.zeros((int)(44100 * interval_between_sent))
181
+ audio_list_sent.append(silence)
182
+ if (interval_between_para - interval_between_sent) > 0:
183
+ silence = np.zeros(
184
+ (int)(44100 * (interval_between_para - interval_between_sent))
185
+ )
186
+ audio_list_sent.append(silence)
187
+ audio16bit = gr.processing_utils.convert_to_16_bit_wav(
188
+ np.concatenate(audio_list_sent)
189
+ ) # 对完整句子做音量归一
190
+ audio_list.append(audio16bit)
191
+ audio_concat = np.concatenate(audio_list)
192
+ return ("Success", (hps.data.sampling_rate, audio_concat))
193
+
194
+
195
+ def process_mix(slice):
196
+ _speaker = slice.pop()
197
+ _text, _lang = [], []
198
+ for lang, content in slice:
199
+ content = content.split("|")
200
+ content = [part for part in content if part != ""]
201
+ if len(content) == 0:
202
+ continue
203
+ if len(_text) == 0:
204
+ _text = [[part] for part in content]
205
+ _lang = [[lang] for part in content]
206
+ else:
207
+ _text[-1].append(content[0])
208
+ _lang[-1].append(lang)
209
+ if len(content) > 1:
210
+ _text += [[part] for part in content[1:]]
211
+ _lang += [[lang] for part in content[1:]]
212
+ return _text, _lang, _speaker
213
+
214
+
215
+ def process_auto(text):
216
+ _text, _lang = [], []
217
+ for slice in text.split("|"):
218
+ if slice == "":
219
+ continue
220
+ temp_text, temp_lang = [], []
221
+ sentences_list = split_by_language(slice, target_languages=["zh", "ja", "en"])
222
+ for sentence, lang in sentences_list:
223
+ if sentence == "":
224
+ continue
225
+ temp_text.append(sentence)
226
+ temp_lang.append(lang.upper())
227
+ _text.append(temp_text)
228
+ _lang.append(temp_lang)
229
+ return _text, _lang
230
+
231
+
232
+ def process_text(
233
+ text: str,
234
+ speaker,
235
+ sdp_ratio,
236
+ noise_scale,
237
+ noise_scale_w,
238
+ length_scale,
239
+ language,
240
+ reference_audio,
241
+ emotion,
242
+ style_text=None,
243
+ style_weight=0,
244
+ ):
245
+ audio_list = []
246
+ if language == "mix":
247
+ bool_valid, str_valid = re_matching.validate_text(text)
248
+ if not bool_valid:
249
+ return str_valid, (
250
+ hps.data.sampling_rate,
251
+ np.concatenate([np.zeros(hps.data.sampling_rate // 2)]),
252
+ )
253
+ for slice in re_matching.text_matching(text):
254
+ _text, _lang, _speaker = process_mix(slice)
255
+ if _speaker is None:
256
+ continue
257
+ print(f"Text: {_text}\nLang: {_lang}")
258
+ audio_list.extend(
259
+ generate_audio_multilang(
260
+ _text,
261
+ sdp_ratio,
262
+ noise_scale,
263
+ noise_scale_w,
264
+ length_scale,
265
+ _speaker,
266
+ _lang,
267
+ reference_audio,
268
+ emotion,
269
+ )
270
+ )
271
+ elif language.lower() == "auto":
272
+ _text, _lang = process_auto(text)
273
+ print(f"Text: {_text}\nLang: {_lang}")
274
+ _lang = [[lang.replace("JA", "JP") for lang in lang_list] for lang_list in _lang]
275
+ audio_list.extend(
276
+ generate_audio_multilang(
277
+ _text,
278
+ sdp_ratio,
279
+ noise_scale,
280
+ noise_scale_w,
281
+ length_scale,
282
+ speaker,
283
+ _lang,
284
+ reference_audio,
285
+ emotion,
286
+ )
287
+ )
288
+ else:
289
+ audio_list.extend(
290
+ generate_audio(
291
+ text.split("|"),
292
+ sdp_ratio,
293
+ noise_scale,
294
+ noise_scale_w,
295
+ length_scale,
296
+ speaker,
297
+ language,
298
+ reference_audio,
299
+ emotion,
300
+ style_text,
301
+ style_weight,
302
+ )
303
+ )
304
+ return audio_list
305
+
306
+
307
+ def tts_fn(
308
+ text: str,
309
+ speaker,
310
+ sdp_ratio,
311
+ noise_scale,
312
+ noise_scale_w,
313
+ length_scale,
314
+ language,
315
+ reference_audio,
316
+ emotion,
317
+ prompt_mode,
318
+ style_text=None,
319
+ style_weight=0,
320
+ ):
321
+ if style_text == "":
322
+ style_text = None
323
+ if prompt_mode == "Audio prompt":
324
+ if reference_audio == None:
325
+ return ("Invalid audio prompt", None)
326
+ else:
327
+ reference_audio = load_audio(reference_audio)[1]
328
+ else:
329
+ reference_audio = None
330
+
331
+ audio_list = process_text(
332
+ text,
333
+ speaker,
334
+ sdp_ratio,
335
+ noise_scale,
336
+ noise_scale_w,
337
+ length_scale,
338
+ language,
339
+ reference_audio,
340
+ emotion,
341
+ style_text,
342
+ style_weight,
343
+ )
344
+
345
+ audio_concat = np.concatenate(audio_list)
346
+ return "Success", (hps.data.sampling_rate, audio_concat)
347
+
348
+
349
+ def format_utils(text, speaker):
350
+ _text, _lang = process_auto(text)
351
+ res = f"[{speaker}]"
352
+ for lang_s, content_s in zip(_lang, _text):
353
+ for lang, content in zip(lang_s, content_s):
354
+ res += f"<{lang.lower()}>{content}"
355
+ res += "|"
356
+ return "mix", res[:-1]
357
+
358
+
359
+ def load_audio(path):
360
+ audio, sr = librosa.load(path, 48000)
361
+ # audio = librosa.resample(audio, 44100, 48000)
362
+ return sr, audio
363
+
364
+
365
+ def gr_util(item):
366
+ if item == "Text prompt":
367
+ return {"visible": True, "__type__": "update"}, {
368
+ "visible": False,
369
+ "__type__": "update",
370
+ }
371
+ else:
372
+ return {"visible": False, "__type__": "update"}, {
373
+ "visible": True,
374
+ "__type__": "update",
375
+ }
376
+
377
+
378
+ if __name__ == "__main__":
379
+ if config.webui_config.debug:
380
+ logger.info("Enable DEBUG-LEVEL log")
381
+ logging.basicConfig(level=logging.DEBUG)
382
+ hps = utils.get_hparams_from_file(config.webui_config.config_path)
383
+ # 若config.json中未指定版本则默认为最新版本
384
+ version = hps.version if hasattr(hps, "version") else latest_version
385
+ net_g = get_net_g(
386
+ model_path=config.webui_config.model, version=version, device=device, hps=hps
387
+ )
388
+ speaker_ids = hps.data.spk2id
389
+ speakers = list(speaker_ids.keys())
390
+ languages = ["ZH", "JP", "EN", "auto", "mix"]
391
+ with gr.Blocks() as app:
392
+ with gr.Row():
393
+ with gr.Column():
394
+ gr.Markdown(value="""
395
+ 【崩坏:星穹铁道】AI知更鸟 在线语音合成(Bert-Vits2)\n
396
+ 作者:幽花夜雪 https://space.bilibili.com/3493264718039598\n
397
+ 数据集作者:@红血球AE3803(https://space.bilibili.com/6589795)\n
398
+ 仓库地址:https://github.com/AI-Hobbyist/StarRail_Datasets\n
399
+ 声音归属:钱琛 https://space.bilibili.com/16582122\n
400
+ Bert-VITS2项目:https://github.com/fishaudio/Bert-VITS2\n
401
+ 使用本模型请严格遵守法律法规!\n
402
+ 发布二创作品请标注本项目作者及链接、作品使用Bert-VITS2 AI生成!\n
403
+ 【提示】手机端容易误触调节,请刷新恢复默认!每次生成的结果都不一样,效果不好请尝试多次生成与调节,选择最佳结果!\n
404
+ """)
405
+ text = gr.TextArea(
406
+ label="输入文本内容",
407
+ placeholder="""
408
+ 推荐不同语言分开推理,因为无法连贯且可能影响最终效果!
409
+ 如果选择语言为\'mix\',必须按照格式输入,否则报错:
410
+ 格式举例(zh是中文,jp是日语,en是英语;不区分大小写):
411
+ [说话人]<zh>你好 <jp>こんにちは <en>Hello
412
+ 另外,所有的语言选项都可以用'|'分割长段实现分句生成。
413
+ """,
414
+ )
415
+ speaker = gr.Dropdown(
416
+ choices=speakers, value=speakers[0], label="Speaker"
417
+ )
418
+ _ = gr.Markdown(
419
+ value="提示模式(Prompt mode):可选文字提示或音频提示,用于生成文字或音频指定风格的声音。\n",
420
+ visible=False,
421
+ )
422
+ prompt_mode = gr.Radio(
423
+ ["Text prompt", "Audio prompt"],
424
+ label="Prompt Mode",
425
+ value="Text prompt",
426
+ visible=False,
427
+ )
428
+ text_prompt = gr.Textbox(
429
+ label="Text prompt",
430
+ placeholder="用文字描述生成风格。如:Happy",
431
+ value="Happy",
432
+ visible=False,
433
+ )
434
+ audio_prompt = gr.Audio(
435
+ label="Audio prompt", type="filepath", visible=False
436
+ )
437
+ sdp_ratio = gr.Slider(
438
+ minimum=0, maximum=1, value=0.6, step=0.01, label="SDP Ratio"
439
+ )
440
+ noise_scale = gr.Slider(
441
+ minimum=0.1, maximum=2, value=0.5, step=0.01, label="Noise"
442
+ )
443
+ noise_scale_w = gr.Slider(
444
+ minimum=0.1, maximum=2, value=0.9, step=0.01, label="Noise_W"
445
+ )
446
+ length_scale = gr.Slider(
447
+ minimum=0.1, maximum=2, value=1.0, step=0.01, label="Length"
448
+ )
449
+ language = gr.Dropdown(
450
+ choices=languages, value=languages[0], label="语言"
451
+ )
452
+ btn = gr.Button("点击生成", variant="primary")
453
+ with gr.Column():
454
+ with gr.Accordion("融合文本语义(实验性功能)", open=False):
455
+ gr.Markdown(
456
+ value="使用辅助文本的语意来辅助生成对话(语言保持与主文本相同)\n\n"
457
+ "**注意**:使用**带有强烈情感的文本**(如:我好快乐!!!���\n\n"
458
+ "效果较不明确,留空即为不使用该功能"
459
+ )
460
+ style_text = gr.Textbox(label="辅助文本")
461
+ style_weight = gr.Slider(
462
+ minimum=0,
463
+ maximum=1,
464
+ value=0.7,
465
+ step=0.1,
466
+ label="Weight",
467
+ info="主文本和辅助文本的bert混合比率,0表示仅主文本,1表示仅辅助文本",
468
+ )
469
+ with gr.Row():
470
+ with gr.Column():
471
+ interval_between_sent = gr.Slider(
472
+ minimum=0,
473
+ maximum=5,
474
+ value=0.2,
475
+ step=0.1,
476
+ label="句间停顿(秒),勾选按句切分才生效",
477
+ )
478
+ interval_between_para = gr.Slider(
479
+ minimum=0,
480
+ maximum=10,
481
+ value=1,
482
+ step=0.1,
483
+ label="段间停顿(秒),需要大于句间停顿才有效",
484
+ )
485
+ opt_cut_by_sent = gr.Checkbox(
486
+ label="按句切分 在按段落切分的基础上再按句子切分文本"
487
+ )
488
+ slicer = gr.Button("切分生成", variant="primary")
489
+ text_output = gr.Textbox(label="状态信息")
490
+ audio_output = gr.Audio(label="输出音频")
491
+ # explain_image = gr.Image(
492
+ # label="参数解释信息",
493
+ # show_label=True,
494
+ # show_share_button=False,
495
+ # show_download_button=False,
496
+ # value=os.path.abspath("./img/参数说明.png"),
497
+ # )
498
+ btn.click(
499
+ tts_fn,
500
+ inputs=[
501
+ text,
502
+ speaker,
503
+ sdp_ratio,
504
+ noise_scale,
505
+ noise_scale_w,
506
+ length_scale,
507
+ language,
508
+ audio_prompt,
509
+ text_prompt,
510
+ prompt_mode,
511
+ style_text,
512
+ style_weight,
513
+ ],
514
+ outputs=[text_output, audio_output],
515
+ )
516
+ slicer.click(
517
+ tts_split,
518
+ inputs=[
519
+ text,
520
+ speaker,
521
+ sdp_ratio,
522
+ noise_scale,
523
+ noise_scale_w,
524
+ length_scale,
525
+ language,
526
+ opt_cut_by_sent,
527
+ interval_between_para,
528
+ interval_between_sent,
529
+ audio_prompt,
530
+ text_prompt,
531
+ style_text,
532
+ style_weight,
533
+ ],
534
+ outputs=[text_output, audio_output],
535
+ )
536
+
537
+ prompt_mode.change(
538
+ lambda x: gr_util(x),
539
+ inputs=[prompt_mode],
540
+ outputs=[text_prompt, audio_prompt],
541
+ )
542
+
543
+ audio_prompt.upload(
544
+ lambda x: load_audio(x),
545
+ inputs=[audio_prompt],
546
+ outputs=[audio_prompt],
547
+ )
548
+
549
+ print("推理页面已开启!")
550
+ webbrowser.open(f"http://127.0.0.1:{config.webui_config.port}")
551
+ app.launch(share=config.webui_config.share, server_port=config.webui_config.port)