ahmadalfakeh commited on
Commit
1e096b9
·
verified ·
1 Parent(s): 3ffd934

Upload 6 files

Browse files
Files changed (6) hide show
  1. LICENSE +202 -0
  2. README.md +14 -0
  3. app.py +267 -0
  4. gitattributes +35 -0
  5. model.py +830 -0
  6. requirements (1).txt +4 -0
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: tts Text To Speech
3
+ emoji: 🌍
4
+ colorFrom: yellow
5
+ colorTo: pink
6
+ sdk: gradio
7
+ sdk_version: 4.36.1
8
+ python_version: 3.8.9
9
+ app_file: app.py
10
+ pinned: false
11
+ license: apache-2.0
12
+ ---
13
+
14
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
app.py ADDED
@@ -0,0 +1,267 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ #
3
+ # Copyright 2022-2023 Xiaomi Corp. (authors: Fangjun Kuang)
4
+ #
5
+ # See LICENSE for clarification regarding multiple authors
6
+ #
7
+ # Licensed under the Apache License, Version 2.0 (the "License");
8
+ # you may not use this file except in compliance with the License.
9
+ # You may obtain a copy of the License at
10
+ #
11
+ # http://www.apache.org/licenses/LICENSE-2.0
12
+ #
13
+ # Unless required by applicable law or agreed to in writing, software
14
+ # distributed under the License is distributed on an "AS IS" BASIS,
15
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
+ # See the License for the specific language governing permissions and
17
+ # limitations under the License.
18
+
19
+ # References:
20
+ # https://gradio.app/docs/#dropdown
21
+
22
+ import os
23
+ import time
24
+ import uuid
25
+ from datetime import datetime
26
+
27
+ import gradio as gr
28
+ import soundfile as sf
29
+
30
+ from model import get_pretrained_model, language_to_models
31
+
32
+
33
+ def MyPrint(s):
34
+ now = datetime.now()
35
+ date_time = now.strftime("%Y-%m-%d %H:%M:%S.%f")
36
+ print(f"{date_time}: {s}")
37
+
38
+
39
+ title = "# Next-gen Kaldi: Text-to-speech (TTS)"
40
+
41
+ description = """
42
+ This space shows how to convert text to speech with Next-gen Kaldi.
43
+
44
+ It is running on CPU within a docker container provided by Hugging Face.
45
+
46
+ See more information by visiting the following links:
47
+
48
+ - <https://github.com/k2-fsa/sherpa-onnx>
49
+
50
+ If you want to deploy it locally, please see
51
+ <https://k2-fsa.github.io/sherpa/>
52
+
53
+ If you want to use Android APKs, please see
54
+ <https://k2-fsa.github.io/sherpa/onnx/tts/apk.html>
55
+
56
+ If you want to use Android text-to-speech engine APKs, please see
57
+ <https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html>
58
+
59
+ If you want to download an all-in-one exe for Windows, please see
60
+ <https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models>
61
+
62
+ """
63
+
64
+ # css style is copied from
65
+ # https://huggingface.co/spaces/alphacep/asr/blob/main/app.py#L113
66
+ css = """
67
+ .result {display:flex;flex-direction:column}
68
+ .result_item {padding:15px;margin-bottom:8px;border-radius:15px;width:100%}
69
+ .result_item_success {background-color:mediumaquamarine;color:white;align-self:start}
70
+ .result_item_error {background-color:#ff7070;color:white;align-self:start}
71
+ """
72
+
73
+ examples = [
74
+ [
75
+ "Chinese (Mandarin, 普通话)",
76
+ "csukuangfj/vits-zh-hf-fanchen-wnj|1",
77
+ "在一个阳光明媚的夏天,小马、小羊和小狗它们一块儿在广阔的草地上,嬉戏玩耍,这时小猴来了,还带着它心爱的足球活蹦乱跳地跑前、跑后教小马、小羊、小狗踢足球。",
78
+ 0,
79
+ 1.0,
80
+ ],
81
+ [
82
+ "Chinese (Mandarin, 普通话)",
83
+ "csukuangfj/vits-zh-hf-fanchen-C|187",
84
+ '小米的使命是,始终坚持做"感动人心、价格厚道"的好产品,让全球每个人都能享受科技带来的美好生活。',
85
+ 0,
86
+ 1.0,
87
+ ],
88
+ ["Min-nan (闽南话)", "csukuangfj/vits-mms-nan", "ài piaǸ chiah ē iaN̂", 0, 1.0],
89
+ ["Thai", "csukuangfj/vits-mms-tha", "ฉันรักคุณ", 0, 1.0],
90
+ [
91
+ "Chinese (Mandarin, 普通话)",
92
+ "csukuangfj/sherpa-onnx-vits-zh-ll|5",
93
+ "当夜幕降临,星光点点,伴随着微风拂面,我在静谧中感受着时光的流转,思念如涟漪荡漾,梦境如画卷展开,我与自然融为一体,沉静在这片宁静的美丽之中,感受着生命的奇迹与温柔。",
94
+ 2,
95
+ 1.0,
96
+ ],
97
+ ]
98
+
99
+
100
+ def update_model_dropdown(language: str):
101
+ if language in language_to_models:
102
+ choices = language_to_models[language]
103
+ return gr.Dropdown(
104
+ choices=choices,
105
+ value=choices[0],
106
+ interactive=True,
107
+ )
108
+
109
+ raise ValueError(f"Unsupported language: {language}")
110
+
111
+
112
+ def build_html_output(s: str, style: str = "result_item_success"):
113
+ return f"""
114
+ <div class='result'>
115
+ <div class='result_item {style}'>
116
+ {s}
117
+ </div>
118
+ </div>
119
+ """
120
+
121
+
122
+ def process(language: str, repo_id: str, text: str, sid: str, speed: float):
123
+ MyPrint(f"Input text: {text}. sid: {sid}, speed: {speed}")
124
+ sid = int(sid)
125
+ tts = get_pretrained_model(repo_id, speed)
126
+
127
+ start = time.time()
128
+ audio = tts.generate(text, sid=sid)
129
+ end = time.time()
130
+
131
+ if len(audio.samples) == 0:
132
+ raise ValueError(
133
+ "Error in generating audios. Please read previous error messages."
134
+ )
135
+
136
+ duration = len(audio.samples) / audio.sample_rate
137
+
138
+ elapsed_seconds = end - start
139
+ rtf = elapsed_seconds / duration
140
+
141
+ info = f"""
142
+ Wave duration : {duration:.3f} s <br/>
143
+ Processing time: {elapsed_seconds:.3f} s <br/>
144
+ RTF: {elapsed_seconds:.3f}/{duration:.3f} = {rtf:.3f} <br/>
145
+ """
146
+
147
+ MyPrint(info)
148
+ MyPrint(f"\nrepo_id: {repo_id}\ntext: {text}\nsid: {sid}\nspeed: {speed}")
149
+
150
+ filename = str(uuid.uuid4())
151
+ filename = f"{filename}.wav"
152
+ sf.write(
153
+ filename,
154
+ audio.samples,
155
+ samplerate=audio.sample_rate,
156
+ subtype="PCM_16",
157
+ )
158
+
159
+ return filename, build_html_output(info)
160
+
161
+
162
+ demo = gr.Blocks(css=css)
163
+
164
+
165
+ with demo:
166
+ gr.Markdown(title)
167
+ language_choices = list(language_to_models.keys())
168
+
169
+ language_radio = gr.Radio(
170
+ label="Language",
171
+ choices=language_choices,
172
+ value=language_choices[0],
173
+ )
174
+
175
+ model_dropdown = gr.Dropdown(
176
+ choices=language_to_models[language_choices[0]],
177
+ label="Select a model",
178
+ value=language_to_models[language_choices[0]][0],
179
+ )
180
+
181
+ language_radio.change(
182
+ update_model_dropdown,
183
+ inputs=language_radio,
184
+ outputs=model_dropdown,
185
+ )
186
+
187
+ with gr.Tabs():
188
+ with gr.TabItem("Please input your text"):
189
+ input_text = gr.Textbox(
190
+ label="Input text",
191
+ info="Your text",
192
+ lines=3,
193
+ placeholder="Please input your text here",
194
+ )
195
+
196
+ input_sid = gr.Textbox(
197
+ label="Speaker ID",
198
+ info="Speaker ID",
199
+ lines=1,
200
+ max_lines=1,
201
+ value="0",
202
+ placeholder="Speaker ID. Valid only for mult-speaker model",
203
+ )
204
+
205
+ input_speed = gr.Slider(
206
+ minimum=0.1,
207
+ maximum=10,
208
+ value=1,
209
+ step=0.1,
210
+ label="Speed (larger->faster; smaller->slower)",
211
+ )
212
+
213
+ input_button = gr.Button("Submit")
214
+
215
+ output_audio = gr.Audio(label="Output")
216
+
217
+ output_info = gr.HTML(label="Info")
218
+
219
+ gr.Examples(
220
+ examples=examples,
221
+ fn=process,
222
+ inputs=[
223
+ language_radio,
224
+ model_dropdown,
225
+ input_text,
226
+ input_sid,
227
+ input_speed,
228
+ ],
229
+ outputs=[
230
+ output_audio,
231
+ output_info,
232
+ ],
233
+ )
234
+
235
+ input_button.click(
236
+ process,
237
+ inputs=[
238
+ language_radio,
239
+ model_dropdown,
240
+ input_text,
241
+ input_sid,
242
+ input_speed,
243
+ ],
244
+ outputs=[
245
+ output_audio,
246
+ output_info,
247
+ ],
248
+ )
249
+
250
+ gr.Markdown(description)
251
+
252
+
253
+ def download_espeak_ng_data():
254
+ os.system(
255
+ """
256
+ cd /tmp
257
+ wget -qq https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
258
+ tar xf espeak-ng-data.tar.bz2
259
+ """
260
+ )
261
+
262
+
263
+ if __name__ == "__main__":
264
+ download_espeak_ng_data()
265
+ formatter = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
266
+
267
+ demo.launch()
gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
model.py ADDED
@@ -0,0 +1,830 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022-2023 Xiaomi Corp. (authors: Fangjun Kuang)
2
+ #
3
+ # See LICENSE for clarification regarding multiple authors
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+
17
+ import os
18
+ from functools import lru_cache
19
+ from pathlib import Path
20
+
21
+ import sherpa_onnx
22
+ from huggingface_hub import hf_hub_download
23
+
24
+
25
+ def get_file(
26
+ repo_id: str,
27
+ filename: str,
28
+ subfolder: str = ".",
29
+ ) -> str:
30
+ model_filename = hf_hub_download(
31
+ repo_id=repo_id,
32
+ filename=filename,
33
+ subfolder=subfolder,
34
+ )
35
+ return model_filename
36
+
37
+
38
+ @lru_cache(maxsize=10)
39
+ def _get_vits_vctk(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
40
+ assert repo_id == "csukuangfj/vits-vctk"
41
+
42
+ model = get_file(
43
+ repo_id=repo_id,
44
+ filename="vits-vctk.onnx",
45
+ subfolder=".",
46
+ )
47
+
48
+ lexicon = get_file(
49
+ repo_id=repo_id,
50
+ filename="lexicon.txt",
51
+ subfolder=".",
52
+ )
53
+
54
+ tokens = get_file(
55
+ repo_id=repo_id,
56
+ filename="tokens.txt",
57
+ subfolder=".",
58
+ )
59
+
60
+ tts_config = sherpa_onnx.OfflineTtsConfig(
61
+ model=sherpa_onnx.OfflineTtsModelConfig(
62
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
63
+ model=model,
64
+ lexicon=lexicon,
65
+ tokens=tokens,
66
+ length_scale=1.0 / speed,
67
+ ),
68
+ provider="cpu",
69
+ debug=True,
70
+ num_threads=2,
71
+ ),
72
+ max_num_sentences=1,
73
+ )
74
+ tts = sherpa_onnx.OfflineTts(tts_config)
75
+
76
+ return tts
77
+
78
+
79
+ @lru_cache(maxsize=10)
80
+ def _get_vits_ljs(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
81
+ assert repo_id == "csukuangfj/vits-ljs"
82
+
83
+ model = get_file(
84
+ repo_id=repo_id,
85
+ filename="vits-ljs.onnx",
86
+ subfolder=".",
87
+ )
88
+
89
+ lexicon = get_file(
90
+ repo_id=repo_id,
91
+ filename="lexicon.txt",
92
+ subfolder=".",
93
+ )
94
+
95
+ tokens = get_file(
96
+ repo_id=repo_id,
97
+ filename="tokens.txt",
98
+ subfolder=".",
99
+ )
100
+
101
+ tts_config = sherpa_onnx.OfflineTtsConfig(
102
+ model=sherpa_onnx.OfflineTtsModelConfig(
103
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
104
+ model=model,
105
+ lexicon=lexicon,
106
+ tokens=tokens,
107
+ length_scale=1.0 / speed,
108
+ ),
109
+ provider="cpu",
110
+ debug=True,
111
+ num_threads=2,
112
+ ),
113
+ max_num_sentences=1,
114
+ )
115
+ tts = sherpa_onnx.OfflineTts(tts_config)
116
+
117
+ return tts
118
+
119
+
120
+ @lru_cache(maxsize=10)
121
+ def _get_vits_piper(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
122
+ data_dir = "/tmp/espeak-ng-data"
123
+ repo_id = repo_id.split("|")[0]
124
+
125
+ if "coqui" in repo_id or "vits-mms" in repo_id:
126
+ name = "model"
127
+ elif "piper" in repo_id:
128
+ n = len("vits-piper-")
129
+ name = repo_id.split("/")[1][n:]
130
+ elif "mimic3" in repo_id:
131
+ n = len("vits-mimic3-")
132
+ name = repo_id.split("/")[1][n:]
133
+ else:
134
+ raise ValueError(f"Unsupported {repo_id}")
135
+
136
+ if "vits-coqui-uk-mai" in repo_id or "vits-mms" in repo_id:
137
+ data_dir = ""
138
+
139
+ model = get_file(
140
+ repo_id=repo_id,
141
+ filename=f"{name}.onnx",
142
+ subfolder=".",
143
+ )
144
+
145
+ tokens = get_file(
146
+ repo_id=repo_id,
147
+ filename="tokens.txt",
148
+ subfolder=".",
149
+ )
150
+
151
+ tts_config = sherpa_onnx.OfflineTtsConfig(
152
+ model=sherpa_onnx.OfflineTtsModelConfig(
153
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
154
+ model=model,
155
+ lexicon="",
156
+ data_dir=data_dir,
157
+ tokens=tokens,
158
+ length_scale=1.0 / speed,
159
+ ),
160
+ provider="cpu",
161
+ debug=True,
162
+ num_threads=2,
163
+ ),
164
+ max_num_sentences=1,
165
+ )
166
+ tts = sherpa_onnx.OfflineTts(tts_config)
167
+
168
+ return tts
169
+
170
+
171
+ @lru_cache(maxsize=10)
172
+ def _get_vits_mms(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
173
+ return _get_vits_piper(repo_id, speed)
174
+
175
+
176
+ @lru_cache(maxsize=10)
177
+ def _get_vits_zh_aishell3(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
178
+ assert repo_id == "csukuangfj/vits-zh-aishell3"
179
+
180
+ model = get_file(
181
+ repo_id=repo_id,
182
+ filename="vits-aishell3.onnx",
183
+ subfolder=".",
184
+ )
185
+
186
+ lexicon = get_file(
187
+ repo_id=repo_id,
188
+ filename="lexicon.txt",
189
+ subfolder=".",
190
+ )
191
+
192
+ tokens = get_file(
193
+ repo_id=repo_id,
194
+ filename="tokens.txt",
195
+ subfolder=".",
196
+ )
197
+
198
+ rule_fsts = ["phone.fst", "date.fst", "number.fst", "new_heteronym.fst"]
199
+
200
+ rule_fsts = [
201
+ get_file(
202
+ repo_id=repo_id,
203
+ filename=f,
204
+ subfolder=".",
205
+ )
206
+ for f in rule_fsts
207
+ ]
208
+ rule_fsts = ",".join(rule_fsts)
209
+
210
+ rule_fars = get_file(
211
+ repo_id=repo_id,
212
+ filename="rule.far",
213
+ subfolder=".",
214
+ )
215
+
216
+ tts_config = sherpa_onnx.OfflineTtsConfig(
217
+ model=sherpa_onnx.OfflineTtsModelConfig(
218
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
219
+ model=model,
220
+ lexicon=lexicon,
221
+ tokens=tokens,
222
+ length_scale=1.0 / speed,
223
+ ),
224
+ provider="cpu",
225
+ debug=True,
226
+ num_threads=2,
227
+ ),
228
+ rule_fsts=rule_fsts,
229
+ rule_fars=rule_fars,
230
+ max_num_sentences=1,
231
+ )
232
+ tts = sherpa_onnx.OfflineTts(tts_config)
233
+
234
+ return tts
235
+
236
+
237
+ @lru_cache(maxsize=10)
238
+ def _get_vits_hf(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
239
+ repo_id = repo_id.split("|")[0]
240
+
241
+ if "fanchen" in repo_id or "vits-cantonese-hf-xiaomaiiwn" in repo_id:
242
+ model = repo_id.split("/")[-1]
243
+ elif "csukuangfj/vits-melo-tts-zh_en" == repo_id:
244
+ model = "model"
245
+ else:
246
+ model = repo_id.split("-")[-1]
247
+
248
+ if "sherpa-onnx-vits-zh-ll" in repo_id:
249
+ model = "model"
250
+
251
+ if not Path("/tmp/dict").is_dir():
252
+ os.system(
253
+ "cd /tmp; curl -SL -O https://github.com/csukuangfj/cppjieba/releases/download/sherpa-onnx-2024-04-19/dict.tar.bz2; tar xvf dict.tar.bz2"
254
+ )
255
+ os.system("ls -lh /tmp/dict")
256
+
257
+ model = get_file(
258
+ repo_id=repo_id,
259
+ filename=f"{model}.onnx",
260
+ subfolder=".",
261
+ )
262
+
263
+ lexicon = get_file(
264
+ repo_id=repo_id,
265
+ filename="lexicon.txt",
266
+ subfolder=".",
267
+ )
268
+
269
+ tokens = get_file(
270
+ repo_id=repo_id,
271
+ filename="tokens.txt",
272
+ subfolder=".",
273
+ )
274
+
275
+ rule_fars = ""
276
+
277
+ if "vits-cantonese-hf-xiaomaiiwn" not in repo_id:
278
+ rule_fsts = ["phone.fst", "date.fst", "number.fst"]
279
+
280
+ rule_fsts = [
281
+ get_file(
282
+ repo_id=repo_id,
283
+ filename=f,
284
+ subfolder=".",
285
+ )
286
+ for f in rule_fsts
287
+ ]
288
+ rule_fsts = ",".join(rule_fsts)
289
+
290
+ # rule_fars = get_file(
291
+ # repo_id=repo_id,
292
+ # filename="rule.far",
293
+ # subfolder=".",
294
+ # )
295
+ vits_dict_dir = "/tmp/dict"
296
+ else:
297
+ rule_fsts = get_file(
298
+ repo_id=repo_id,
299
+ filename="rule.fst",
300
+ subfolder=".",
301
+ )
302
+ vits_dict_dir = ""
303
+
304
+ tts_config = sherpa_onnx.OfflineTtsConfig(
305
+ model=sherpa_onnx.OfflineTtsModelConfig(
306
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
307
+ model=model,
308
+ lexicon=lexicon,
309
+ tokens=tokens,
310
+ dict_dir=vits_dict_dir,
311
+ length_scale=1.0 / speed,
312
+ ),
313
+ provider="cpu",
314
+ debug=True,
315
+ num_threads=2,
316
+ ),
317
+ rule_fsts=rule_fsts,
318
+ rule_fars=rule_fars,
319
+ max_num_sentences=1,
320
+ )
321
+ tts = sherpa_onnx.OfflineTts(tts_config)
322
+
323
+ return tts
324
+
325
+
326
+ @lru_cache(maxsize=10)
327
+ def get_pretrained_model(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
328
+ if repo_id in chinese_models:
329
+ return chinese_models[repo_id](repo_id, speed)
330
+ elif repo_id in chinese_english_models:
331
+ return chinese_english_models[repo_id](repo_id, speed)
332
+ if repo_id in cantonese_models:
333
+ return cantonese_models[repo_id](repo_id, speed)
334
+ elif repo_id in english_models:
335
+ return english_models[repo_id](repo_id, speed)
336
+ elif repo_id in german_models:
337
+ return german_models[repo_id](repo_id, speed)
338
+ elif repo_id in spanish_models:
339
+ return spanish_models[repo_id](repo_id, speed)
340
+ elif repo_id in french_models:
341
+ return french_models[repo_id](repo_id, speed)
342
+ elif repo_id in ukrainian_models:
343
+ return ukrainian_models[repo_id](repo_id, speed)
344
+ elif repo_id in russian_models:
345
+ return russian_models[repo_id](repo_id, speed)
346
+ elif repo_id in arabic_models:
347
+ return arabic_models[repo_id](repo_id, speed)
348
+ elif repo_id in catalan_models:
349
+ return catalan_models[repo_id](repo_id, speed)
350
+ elif repo_id in czech_models:
351
+ return czech_models[repo_id](repo_id, speed)
352
+ elif repo_id in danish_models:
353
+ return danish_models[repo_id](repo_id, speed)
354
+ elif repo_id in greek_models:
355
+ return greek_models[repo_id](repo_id, speed)
356
+ elif repo_id in finnish_models:
357
+ return finnish_models[repo_id](repo_id, speed)
358
+ elif repo_id in hungarian_models:
359
+ return hungarian_models[repo_id](repo_id, speed)
360
+ elif repo_id in icelandic_models:
361
+ return icelandic_models[repo_id](repo_id, speed)
362
+ elif repo_id in italian_models:
363
+ return italian_models[repo_id](repo_id, speed)
364
+ elif repo_id in georgian_models:
365
+ return georgian_models[repo_id](repo_id, speed)
366
+ elif repo_id in kazakh_models:
367
+ return kazakh_models[repo_id](repo_id, speed)
368
+ elif repo_id in luxembourgish_models:
369
+ return luxembourgish_models[repo_id](repo_id, speed)
370
+ elif repo_id in nepali_models:
371
+ return nepali_models[repo_id](repo_id, speed)
372
+ elif repo_id in dutch_models:
373
+ return dutch_models[repo_id](repo_id, speed)
374
+ elif repo_id in norwegian_models:
375
+ return norwegian_models[repo_id](repo_id, speed)
376
+ elif repo_id in polish_models:
377
+ return polish_models[repo_id](repo_id, speed)
378
+ elif repo_id in portuguese_models:
379
+ return portuguese_models[repo_id](repo_id, speed)
380
+ elif repo_id in romanian_models:
381
+ return romanian_models[repo_id](repo_id, speed)
382
+ elif repo_id in slovak_models:
383
+ return slovak_models[repo_id](repo_id, speed)
384
+ elif repo_id in serbian_models:
385
+ return serbian_models[repo_id](repo_id, speed)
386
+ elif repo_id in swedish_models:
387
+ return swedish_models[repo_id](repo_id, speed)
388
+ elif repo_id in swahili_models:
389
+ return swahili_models[repo_id](repo_id, speed)
390
+ elif repo_id in turkish_models:
391
+ return turkish_models[repo_id](repo_id, speed)
392
+ elif repo_id in vietnamese_models:
393
+ return vietnamese_models[repo_id](repo_id, speed)
394
+ elif repo_id in bulgarian_models:
395
+ return bulgarian_models[repo_id](repo_id, speed)
396
+ elif repo_id in estonian_models:
397
+ return estonian_models[repo_id](repo_id, speed)
398
+ elif repo_id in irish_models:
399
+ return irish_models[repo_id](repo_id, speed)
400
+ elif repo_id in croatian_models:
401
+ return croatian_models[repo_id](repo_id, speed)
402
+ elif repo_id in lithuanian_models:
403
+ return lithuanian_models[repo_id](repo_id, speed)
404
+ elif repo_id in latvian_models:
405
+ return latvian_models[repo_id](repo_id, speed)
406
+ elif repo_id in maltese_models:
407
+ return maltese_models[repo_id](repo_id, speed)
408
+ elif repo_id in slovenian_models:
409
+ return slovenian_models[repo_id](repo_id, speed)
410
+ elif repo_id in bengali_models:
411
+ return bengali_models[repo_id](repo_id, speed)
412
+ elif repo_id in min_nan_models:
413
+ return min_nan_models[repo_id](repo_id, speed)
414
+ elif repo_id in thai_models:
415
+ return thai_models[repo_id](repo_id, speed)
416
+ elif repo_id in persian_models:
417
+ return persian_models[repo_id](repo_id, speed)
418
+ elif repo_id in korean_models:
419
+ return korean_models[repo_id](repo_id, speed)
420
+ elif repo_id in afrikaans_models:
421
+ return afrikaans_models[repo_id](repo_id, speed)
422
+ elif repo_id in gujarati_models:
423
+ return gujarati_models[repo_id](repo_id, speed)
424
+ elif repo_id in tswana_models:
425
+ return tswana_models[repo_id](repo_id, speed)
426
+ elif repo_id in welsh_models:
427
+ return welsh_models[repo_id](repo_id, speed)
428
+ else:
429
+ raise ValueError(f"Unsupported repo_id: {repo_id}")
430
+
431
+
432
+ cantonese_models = {
433
+ "csukuangfj/vits-cantonese-hf-xiaomaiiwn": _get_vits_hf,
434
+ }
435
+
436
+ chinese_english_models = {
437
+ "csukuangfj/vits-melo-tts-zh_en|1": _get_vits_hf, # 1
438
+ }
439
+
440
+ chinese_models = {
441
+ "csukuangfj/vits-zh-hf-fanchen-wnj|1": _get_vits_hf, # 1
442
+ "csukuangfj/vits-zh-hf-fanchen-C|187": _get_vits_hf, # 187
443
+ "csukuangfj/sherpa-onnx-vits-zh-ll|5": _get_vits_hf, # 804
444
+ "csukuangfj/vits-zh-hf-keqing|804": _get_vits_hf, # 804
445
+ "csukuangfj/vits-zh-hf-theresa|804": _get_vits_hf, # 804
446
+ "csukuangfj/vits-zh-hf-eula|804": _get_vits_hf, # 804
447
+ "csukuangfj/vits-zh-hf-echo|804": _get_vits_hf, # 804
448
+ "csukuangfj/vits-zh-hf-bronya|804": _get_vits_hf, # 804
449
+ "csukuangfj/vits-zh-hf-doom|804": _get_vits_hf, # 804
450
+ "csukuangfj/vits-zh-hf-zenyatta|804": _get_vits_hf, # 804
451
+ "csukuangfj/vits-zh-hf-abyssinvoker|804": _get_vits_hf, # 804
452
+ "csukuangfj/vits-zh-hf-fanchen-ZhiHuiLaoZhe|1": _get_vits_hf, # 1
453
+ "csukuangfj/vits-zh-hf-fanchen-ZhiHuiLaoZhe_new|1": _get_vits_hf, # 1
454
+ "csukuangfj/vits-zh-hf-fanchen-unity|1": _get_vits_hf, # 1
455
+ "csukuangfj/vits-zh-aishell3": _get_vits_zh_aishell3,
456
+ "csukuangfj/vits-piper-zh_CN-huayan-medium": _get_vits_piper,
457
+ # "csukuangfj/vits-piper-zh_CN-huayan-x_low": _get_vits_piper,
458
+ }
459
+
460
+ english_models = {
461
+ "csukuangfj/vits-piper-en_US-glados|1 speaker": _get_vits_piper,
462
+ "csukuangfj/vits-piper-en_GB-southern_english_male-medium|8 speakers": _get_vits_piper,
463
+ "csukuangfj/vits-piper-en_GB-southern_english_female-medium|6 speakers": _get_vits_piper,
464
+ "csukuangfj/vits-piper-en_US-bryce-medium|1 speaker": _get_vits_piper,
465
+ "csukuangfj/vits-piper-en_US-john-medium|1 speaker": _get_vits_piper,
466
+ "csukuangfj/vits-piper-en_US-norman-medium|1 speaker": _get_vits_piper,
467
+ # coqui-ai
468
+ "csukuangfj/vits-coqui-en-ljspeech|1 speaker": _get_vits_piper,
469
+ "csukuangfj/vits-coqui-en-ljspeech-neon|1 speaker": _get_vits_piper,
470
+ "csukuangfj/vits-coqui-en-vctk|109 speakers": _get_vits_piper,
471
+ # piper, US
472
+ "csukuangfj/vits-piper-en_GB-sweetbbak-amy|1 speaker": _get_vits_piper,
473
+ "csukuangfj/vits-piper-en_US-amy-low|1 speaker": _get_vits_piper,
474
+ "csukuangfj/vits-piper-en_US-amy-medium|1 speaker": _get_vits_piper,
475
+ "csukuangfj/vits-piper-en_US-arctic-medium|18 speakers": _get_vits_piper, # 18 speakers
476
+ "csukuangfj/vits-piper-en_US-danny-low|1 speaker": _get_vits_piper,
477
+ "csukuangfj/vits-piper-en_US-hfc_male-medium|1 speaker": _get_vits_piper,
478
+ "csukuangfj/vits-piper-en_US-hfc_female-medium|1 speaker": _get_vits_piper,
479
+ "csukuangfj/vits-piper-en_US-joe-medium|1 speaker": _get_vits_piper,
480
+ "csukuangfj/vits-piper-en_US-kathleen-low|1 speaker": _get_vits_piper,
481
+ "csukuangfj/vits-piper-en_US-kusal-medium|1 speaker": _get_vits_piper,
482
+ "csukuangfj/vits-piper-en_US-l2arctic-medium|24 speakers": _get_vits_piper, # 24 speakers
483
+ "csukuangfj/vits-piper-en_US-lessac-high|1 speaker": _get_vits_piper,
484
+ "csukuangfj/vits-piper-en_US-lessac-low|1 speaker": _get_vits_piper,
485
+ "csukuangfj/vits-piper-en_US-lessac-medium|1 speaker": _get_vits_piper,
486
+ "csukuangfj/vits-piper-en_US-libritts-high|904 speakers": _get_vits_piper, # 904 speakers
487
+ "csukuangfj/vits-piper-en_US-libritts_r-medium|904 speakers": _get_vits_piper, # 904 speakers
488
+ "csukuangfj/vits-piper-en_US-ljspeech-high|1 speaker": _get_vits_piper,
489
+ "csukuangfj/vits-piper-en_US-ljspeech-medium|1 speaker": _get_vits_piper,
490
+ "csukuangfj/vits-piper-en_US-ryan-high|1 speaker": _get_vits_piper,
491
+ "csukuangfj/vits-piper-en_US-ryan-low|1 speaker": _get_vits_piper,
492
+ "csukuangfj/vits-piper-en_US-ryan-medium|1 speaker": _get_vits_piper,
493
+ # piper, GB
494
+ "csukuangfj/vits-piper-en_GB-alan-low|1 speaker": _get_vits_piper,
495
+ "csukuangfj/vits-piper-en_GB-alan-medium|1 speaker": _get_vits_piper,
496
+ "csukuangfj/vits-piper-en_GB-alan-medium": _get_vits_piper,
497
+ "csukuangfj/vits-piper-en_GB-cori-high|1 speaker": _get_vits_piper,
498
+ "csukuangfj/vits-piper-en_GB-cori-medium|1 speaker": _get_vits_piper,
499
+ "csukuangfj/vits-piper-en_GB-jenny_dioco-medium|1 speaker": _get_vits_piper,
500
+ "csukuangfj/vits-piper-en_GB-northern_english_male-medium|1 speaker": _get_vits_piper,
501
+ "csukuangfj/vits-piper-en_GB-semaine-medium|4 speakers": _get_vits_piper,
502
+ "csukuangfj/vits-piper-en_GB-southern_english_female-low|1 speaker": _get_vits_piper,
503
+ "csukuangfj/vits-piper-en_GB-vctk-medium|109 speakers": _get_vits_piper,
504
+ #
505
+ "csukuangfj/vits-vctk|109 speakers": _get_vits_vctk, # 109 speakers
506
+ "csukuangfj/vits-ljs|1 speaker": _get_vits_ljs,
507
+ }
508
+
509
+ german_models = {
510
+ "csukuangfj/vits-coqui-de-css10|1 speaker": _get_vits_piper,
511
+ "csukuangfj/vits-piper-de_DE-eva_k-x_low|1 speaker": _get_vits_piper,
512
+ "csukuangfj/vits-piper-de_DE-karlsson-low|1 speaker": _get_vits_piper,
513
+ "csukuangfj/vits-piper-de_DE-kerstin-low|1 speaker": _get_vits_piper,
514
+ # "csukuangfj/vits-piper-de_DE-mls-medium": _get_vits_piper,
515
+ "csukuangfj/vits-piper-de_DE-pavoque-low|1 speaker": _get_vits_piper,
516
+ "csukuangfj/vits-piper-de_DE-ramona-low|1 speaker": _get_vits_piper,
517
+ "csukuangfj/vits-piper-de_DE-thorsten-low|1 speaker": _get_vits_piper,
518
+ "csukuangfj/vits-piper-de_DE-thorsten-medium|1 speaker": _get_vits_piper,
519
+ "csukuangfj/vits-piper-de_DE-thorsten-high|1 speaker": _get_vits_piper,
520
+ "csukuangfj/vits-piper-de_DE-thorsten_emotional-medium|8 speakers": _get_vits_piper, # 8 speakers
521
+ }
522
+
523
+ spanish_models = {
524
+ # "csukuangfj/vits-coqui-es-css10": _get_vits_piper,
525
+ "csukuangfj/vits-piper-es-glados-medium": _get_vits_piper,
526
+ "csukuangfj/vits-piper-es_ES-carlfm-x_low": _get_vits_piper,
527
+ "csukuangfj/vits-piper-es_ES-davefx-medium": _get_vits_piper,
528
+ # "csukuangfj/vits-piper-es_ES-mls_10246-low": _get_vits_piper,
529
+ # "csukuangfj/vits-piper-es_ES-mls_9972-low": _get_vits_piper,
530
+ "csukuangfj/vits-piper-es_ES-sharvard-medium": _get_vits_piper, # 2 speakers
531
+ "csukuangfj/vits-piper-es_MX-ald-medium": _get_vits_piper,
532
+ "csukuangfj/vits-piper-es_MX-claude-high": _get_vits_piper,
533
+ "csukuangfj/vits-mimic3-es_ES-m-ailabs_low": _get_vits_piper,
534
+ }
535
+
536
+ french_models = {
537
+ "csukuangfj/vits-coqui-fr-css10": _get_vits_piper,
538
+ # "csukuangfj/vits-piper-fr_FR-gilles-low": _get_vits_piper,
539
+ # "csukuangfj/vits-piper-fr_FR-mls_1840-low": _get_vits_piper,
540
+ # "csukuangfj/vits-piper-fr_FR-mls-medium": _get_vits_piper, # 2 speakers, 0-femal, 1-male
541
+ "csukuangfj/vits-piper-fr_FR-upmc-medium": _get_vits_piper, # 2 speakers, 0-femal, 1-male
542
+ "csukuangfj/vits-piper-fr_FR-tom-medium|1 speaker": _get_vits_piper, # 2 speakers, 0-femal, 1-male
543
+ "csukuangfj/vits-piper-fr_FR-siwis-low": _get_vits_piper, # female
544
+ "csukuangfj/vits-piper-fr_FR-siwis-medium": _get_vits_piper,
545
+ "csukuangfj/vits-piper-fr_FR-tjiho-model1": _get_vits_piper,
546
+ "csukuangfj/vits-piper-fr_FR-tjiho-model2": _get_vits_piper,
547
+ "csukuangfj/vits-piper-fr_FR-tjiho-model3": _get_vits_piper,
548
+ }
549
+
550
+ ukrainian_models = {
551
+ "csukuangfj/vits-piper-uk_UA-lada-x_low": _get_vits_piper,
552
+ "csukuangfj/vits-coqui-uk-mai": _get_vits_piper,
553
+ # "csukuangfj/vits-piper-uk_UA-ukrainian_tts-medium": _get_vits_piper, # does not work somehow
554
+ }
555
+
556
+ russian_models = {
557
+ "csukuangfj/vits-piper-ru_RU-denis-medium": _get_vits_piper,
558
+ "csukuangfj/vits-piper-ru_RU-dmitri-medium": _get_vits_piper,
559
+ "csukuangfj/vits-piper-ru_RU-irina-medium": _get_vits_piper,
560
+ "csukuangfj/vits-piper-ru_RU-ruslan-medium": _get_vits_piper,
561
+ }
562
+
563
+ arabic_models = {
564
+ "csukuangfj/vits-piper-ar_JO-kareem-low": _get_vits_piper,
565
+ "csukuangfj/vits-piper-ar_JO-kareem-medium": _get_vits_piper,
566
+ }
567
+
568
+ catalan_models = {
569
+ "csukuangfj/vits-piper-ca_ES-upc_ona-x_low": _get_vits_piper,
570
+ "csukuangfj/vits-piper-ca_ES-upc_ona-medium": _get_vits_piper,
571
+ "csukuangfj/vits-piper-ca_ES-upc_pau-x_low": _get_vits_piper,
572
+ }
573
+
574
+ czech_models = {
575
+ "csukuangfj/vits-piper-cs_CZ-jirka-low": _get_vits_piper,
576
+ "csukuangfj/vits-piper-cs_CZ-jirka-medium": _get_vits_piper,
577
+ "csukuangfj/vits-coqui-cs-cv": _get_vits_piper,
578
+ }
579
+
580
+ danish_models = {
581
+ "csukuangfj/vits-coqui-da-cv": _get_vits_piper,
582
+ "csukuangfj/vits-piper-da_DK-talesyntese-medium": _get_vits_piper,
583
+ }
584
+
585
+ greek_models = {
586
+ "csukuangfj/vits-piper-el_GR-rapunzelina-low": _get_vits_piper,
587
+ # "csukuangfj/vits-mimic3-el_GR-rapunzelina_low": _get_vits_piper,
588
+ }
589
+
590
+ finnish_models = {
591
+ "csukuangfj/vits-coqui-fi-css10": _get_vits_piper,
592
+ "csukuangfj/vits-piper-fi_FI-harri-low": _get_vits_piper,
593
+ "csukuangfj/vits-piper-fi_FI-harri-medium": _get_vits_piper,
594
+ "csukuangfj/vits-mimic3-fi_FI-harri-tapani-ylilammi_low": _get_vits_piper,
595
+ }
596
+
597
+ hungarian_models = {
598
+ # "csukuangfj/vits-coqui-hu-css10": _get_vits_piper,
599
+ "csukuangfj/vits-piper-hu_HU-anna-medium": _get_vits_piper,
600
+ "csukuangfj/vits-piper-hu_HU-berta-medium": _get_vits_piper,
601
+ "csukuangfj/vits-piper-hu_HU-imre-medium": _get_vits_piper,
602
+ "csukuangfj/vits-mimic3-hu_HU-diana-majlinger_low": _get_vits_piper,
603
+ }
604
+
605
+ icelandic_models = {
606
+ "csukuangfj/vits-piper-is_IS-bui-medium": _get_vits_piper,
607
+ "csukuangfj/vits-piper-is_IS-salka-medium": _get_vits_piper,
608
+ "csukuangfj/vits-piper-is_IS-steinn-medium": _get_vits_piper,
609
+ "csukuangfj/vits-piper-is_IS-ugla-medium": _get_vits_piper,
610
+ }
611
+
612
+ italian_models = {
613
+ "csukuangfj/vits-piper-it_IT-riccardo-x_low": _get_vits_piper,
614
+ "csukuangfj/vits-piper-it_IT-paola-medium": _get_vits_piper,
615
+ }
616
+
617
+ georgian_models = {
618
+ "csukuangfj/vits-piper-ka_GE-natia-medium": _get_vits_piper,
619
+ }
620
+
621
+ kazakh_models = {
622
+ "csukuangfj/vits-piper-kk_KZ-iseke-x_low": _get_vits_piper,
623
+ "csukuangfj/vits-piper-kk_KZ-issai-high": _get_vits_piper,
624
+ "csukuangfj/vits-piper-kk_KZ-raya-x_low": _get_vits_piper,
625
+ }
626
+
627
+ luxembourgish_models = {
628
+ "csukuangfj/vits-piper-lb_LU-marylux-medium": _get_vits_piper,
629
+ }
630
+
631
+ nepali_models = {
632
+ "csukuangfj/vits-piper-ne_NP-google-medium": _get_vits_piper,
633
+ "csukuangfj/vits-piper-ne_NP-google-x_low": _get_vits_piper,
634
+ "csukuangfj/vits-mimic3-ne_NP-ne-google_low": _get_vits_piper,
635
+ }
636
+
637
+ dutch_models = {
638
+ "csukuangfj/vits-coqui-nl-css10": _get_vits_piper,
639
+ "csukuangfj/vits-piper-nl_BE-nathalie-medium": _get_vits_piper,
640
+ "csukuangfj/vits-piper-nl_BE-nathalie-x_low": _get_vits_piper,
641
+ "csukuangfj/vits-piper-nl_BE-rdh-medium": _get_vits_piper,
642
+ "csukuangfj/vits-piper-nl_BE-rdh-x_low": _get_vits_piper,
643
+ "csukuangfj/vits-piper-nl_NL-mls-medium": _get_vits_piper,
644
+ "csukuangfj/vits-piper-nl_NL-mls_5809-low": _get_vits_piper,
645
+ "csukuangfj/vits-piper-nl_NL-mls_7432-low": _get_vits_piper,
646
+ }
647
+
648
+ norwegian_models = {
649
+ "csukuangfj/vits-piper-no_NO-talesyntese-medium": _get_vits_piper,
650
+ }
651
+
652
+ polish_models = {
653
+ "csukuangfj/vits-coqui-pl-mai_female": _get_vits_piper,
654
+ "csukuangfj/vits-piper-pl_PL-darkman-medium": _get_vits_piper,
655
+ "csukuangfj/vits-piper-pl_PL-gosia-medium": _get_vits_piper,
656
+ "csukuangfj/vits-piper-pl_PL-mc_speech-medium": _get_vits_piper,
657
+ # "csukuangfj/vits-piper-pl_PL-mls_6892-low": _get_vits_piper,
658
+ "csukuangfj/vits-mimic3-pl_PL-m-ailabs_low": _get_vits_piper,
659
+ }
660
+
661
+ portuguese_models = {
662
+ "csukuangfj/vits-coqui-pt-cv": _get_vits_piper,
663
+ "csukuangfj/vits-piper-pt_BR-edresson-low": _get_vits_piper,
664
+ "csukuangfj/vits-piper-pt_BR-faber-medium": _get_vits_piper,
665
+ "csukuangfj/vits-piper-pt_PT-tugao-medium": _get_vits_piper,
666
+ }
667
+
668
+ romanian_models = {
669
+ "csukuangfj/vits-coqui-ro-cv": _get_vits_piper,
670
+ "csukuangfj/vits-piper-ro_RO-mihai-medium": _get_vits_piper,
671
+ }
672
+
673
+
674
+ slovak_models = {
675
+ "csukuangfj/vits-coqui-sk-cv": _get_vits_piper,
676
+ "csukuangfj/vits-piper-sk_SK-lili-medium": _get_vits_piper,
677
+ }
678
+
679
+ serbian_models = {
680
+ "csukuangfj/vits-piper-sr_RS-serbski_institut-medium": _get_vits_piper,
681
+ }
682
+
683
+ swedish_models = {
684
+ "csukuangfj/vits-coqui-sv-cv": _get_vits_piper,
685
+ "csukuangfj/vits-piper-sv_SE-nst-medium": _get_vits_piper,
686
+ }
687
+
688
+ swahili_models = {
689
+ "csukuangfj/vits-piper-sw_CD-lanfrica-medium": _get_vits_piper,
690
+ }
691
+
692
+ turkish_models = {
693
+ "csukuangfj/vits-piper-tr_TR-dfki-medium": _get_vits_piper,
694
+ "csukuangfj/vits-piper-tr_TR-fahrettin-medium": _get_vits_piper,
695
+ "csukuangfj/vits-piper-tr_TR-fettah-medium|1 speaker": _get_vits_piper,
696
+ }
697
+
698
+ vietnamese_models = {
699
+ "csukuangfj/vits-piper-vi_VN-25hours_single-low": _get_vits_piper,
700
+ "csukuangfj/vits-piper-vi_VN-vais1000-medium": _get_vits_piper,
701
+ "csukuangfj/vits-piper-vi_VN-vivos-x_low": _get_vits_piper,
702
+ "csukuangfj/vits-mimic3-vi_VN-vais1000_low": _get_vits_piper,
703
+ }
704
+
705
+ bulgarian_models = {
706
+ "csukuangfj/vits-coqui-bg-cv": _get_vits_piper,
707
+ }
708
+
709
+ estonian_models = {
710
+ "csukuangfj/vits-coqui-et-cv": _get_vits_piper,
711
+ }
712
+
713
+ irish_models = {
714
+ "csukuangfj/vits-coqui-ga-cv": _get_vits_piper,
715
+ }
716
+
717
+ croatian_models = {
718
+ "csukuangfj/vits-coqui-hr-cv": _get_vits_piper,
719
+ }
720
+
721
+ lithuanian_models = {
722
+ "csukuangfj/vits-coqui-lt-cv": _get_vits_piper,
723
+ }
724
+
725
+ latvian_models = {
726
+ "csukuangfj/vits-coqui-lv-cv": _get_vits_piper,
727
+ }
728
+
729
+ maltese_models = {
730
+ "csukuangfj/vits-coqui-mt-cv": _get_vits_piper,
731
+ }
732
+
733
+ slovenian_models = {
734
+ "csukuangfj/vits-piper-sl_SI-artur-medium": _get_vits_piper,
735
+ "csukuangfj/vits-coqui-sl-cv": _get_vits_piper,
736
+ }
737
+
738
+ # Bangla
739
+ bengali_models = {
740
+ "csukuangfj/vits-coqui-bn-custom_female": _get_vits_piper,
741
+ "csukuangfj/vits-mimic3-bn-multi_low": _get_vits_piper,
742
+ }
743
+
744
+ min_nan_models = {
745
+ "csukuangfj/vits-mms-nan": _get_vits_mms,
746
+ }
747
+
748
+ thai_models = {
749
+ "csukuangfj/vits-mms-tha": _get_vits_mms,
750
+ }
751
+
752
+ persian_models = {
753
+ "csukuangfj/vits-piper-fa_IR-amir-medium": _get_vits_piper,
754
+ "csukuangfj/vits-piper-fa_IR-gyro-medium": _get_vits_piper,
755
+ "csukuangfj/vits-mimic3-fa-haaniye_low": _get_vits_piper,
756
+ }
757
+
758
+ korean_models = {
759
+ "csukuangfj/vits-mimic3-ko_KO-kss_low": _get_vits_piper,
760
+ }
761
+
762
+
763
+ afrikaans_models = {
764
+ "csukuangfj/vits-mimic3-af_ZA-google-nwu_low": _get_vits_piper,
765
+ }
766
+
767
+ gujarati_models = {
768
+ "csukuangfj/vits-mimic3-gu_IN-cmu-indic_low": _get_vits_piper,
769
+ }
770
+
771
+ tswana_models = {
772
+ "csukuangfj/vits-mimic3-tn_ZA-google-nwu_low": _get_vits_piper,
773
+ }
774
+
775
+ welsh_models = {
776
+ "csukuangfj/vits-piper-cy_GB-gwryw_gogleddol-medium|1 speaker": _get_vits_piper,
777
+ }
778
+
779
+ language_to_models = {
780
+ "English": list(english_models.keys()),
781
+ "Chinese (Mandarin, 普通话)": list(chinese_models.keys()),
782
+ "Chinese+English": list(chinese_english_models.keys()),
783
+ "Cantonese (粤语)": list(cantonese_models.keys()),
784
+ "Min-nan (闽南话)": list(min_nan_models.keys()),
785
+ "Arabic": list(arabic_models.keys()),
786
+ "Afrikaans": list(afrikaans_models.keys()),
787
+ "Bengali": list(bengali_models.keys()),
788
+ "Bulgarian": list(bulgarian_models.keys()),
789
+ "Catalan": list(catalan_models.keys()),
790
+ "Croatian": list(croatian_models.keys()),
791
+ "Czech": list(czech_models.keys()),
792
+ "Danish": list(danish_models.keys()),
793
+ "Dutch": list(dutch_models.keys()),
794
+ "Estonian": list(estonian_models.keys()),
795
+ "Finnish": list(finnish_models.keys()),
796
+ "French": list(french_models.keys()),
797
+ "Georgian": list(georgian_models.keys()),
798
+ "German": list(german_models.keys()),
799
+ "Greek": list(greek_models.keys()),
800
+ "Gujarati": list(gujarati_models.keys()),
801
+ "Hungarian": list(hungarian_models.keys()),
802
+ "Icelandic": list(icelandic_models.keys()),
803
+ "Irish": list(irish_models.keys()),
804
+ "Italian": list(italian_models.keys()),
805
+ "Kazakh": list(kazakh_models.keys()),
806
+ "Korean": list(korean_models.keys()),
807
+ "Latvian": list(latvian_models.keys()),
808
+ "Lithuanian": list(lithuanian_models.keys()),
809
+ "Luxembourgish": list(luxembourgish_models.keys()),
810
+ "Maltese": list(maltese_models.keys()),
811
+ "Nepali": list(nepali_models.keys()),
812
+ "Norwegian": list(norwegian_models.keys()),
813
+ "Persian": list(persian_models.keys()),
814
+ "Polish": list(polish_models.keys()),
815
+ "Portuguese": list(portuguese_models.keys()),
816
+ "Romanian": list(romanian_models.keys()),
817
+ "Russian": list(russian_models.keys()),
818
+ "Serbian": list(serbian_models.keys()),
819
+ "Slovak": list(slovak_models.keys()),
820
+ "Slovenian": list(slovenian_models.keys()),
821
+ "Spanish": list(spanish_models.keys()),
822
+ "Swahili": list(swahili_models.keys()),
823
+ "Swedish": list(swedish_models.keys()),
824
+ "Thai": list(thai_models.keys()),
825
+ "Tswana": list(tswana_models.keys()),
826
+ "Turkish": list(turkish_models.keys()),
827
+ "Ukrainian": list(ukrainian_models.keys()),
828
+ "Vietnamese": list(vietnamese_models.keys()),
829
+ "Welsh": list(welsh_models.keys()),
830
+ }
requirements (1).txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ https://huggingface.co/csukuangfj/sherpa-onnx-wheels/resolve/main/cpu/1.10.20/sherpa_onnx-1.10.20-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
2
+ #sherpa-onnx>=1.10.16
3
+
4
+ soundfile