lakikishorsubba commited on
Commit
16ba0ed
1 Parent(s): 333b795

Upload 4 files

Browse files
Files changed (4) hide show
  1. LICENSE +202 -0
  2. app.py +241 -0
  3. model.py +795 -0
  4. requirements.txt +4 -0
LICENSE ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ Apache License
3
+ Version 2.0, January 2004
4
+ http://www.apache.org/licenses/
5
+
6
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
+
8
+ 1. Definitions.
9
+
10
+ "License" shall mean the terms and conditions for use, reproduction,
11
+ and distribution as defined by Sections 1 through 9 of this document.
12
+
13
+ "Licensor" shall mean the copyright owner or entity authorized by
14
+ the copyright owner that is granting the License.
15
+
16
+ "Legal Entity" shall mean the union of the acting entity and all
17
+ other entities that control, are controlled by, or are under common
18
+ control with that entity. For the purposes of this definition,
19
+ "control" means (i) the power, direct or indirect, to cause the
20
+ direction or management of such entity, whether by contract or
21
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
22
+ outstanding shares, or (iii) beneficial ownership of such entity.
23
+
24
+ "You" (or "Your") shall mean an individual or Legal Entity
25
+ exercising permissions granted by this License.
26
+
27
+ "Source" form shall mean the preferred form for making modifications,
28
+ including but not limited to software source code, documentation
29
+ source, and configuration files.
30
+
31
+ "Object" form shall mean any form resulting from mechanical
32
+ transformation or translation of a Source form, including but
33
+ not limited to compiled object code, generated documentation,
34
+ and conversions to other media types.
35
+
36
+ "Work" shall mean the work of authorship, whether in Source or
37
+ Object form, made available under the License, as indicated by a
38
+ copyright notice that is included in or attached to the work
39
+ (an example is provided in the Appendix below).
40
+
41
+ "Derivative Works" shall mean any work, whether in Source or Object
42
+ form, that is based on (or derived from) the Work and for which the
43
+ editorial revisions, annotations, elaborations, or other modifications
44
+ represent, as a whole, an original work of authorship. For the purposes
45
+ of this License, Derivative Works shall not include works that remain
46
+ separable from, or merely link (or bind by name) to the interfaces of,
47
+ the Work and Derivative Works thereof.
48
+
49
+ "Contribution" shall mean any work of authorship, including
50
+ the original version of the Work and any modifications or additions
51
+ to that Work or Derivative Works thereof, that is intentionally
52
+ submitted to Licensor for inclusion in the Work by the copyright owner
53
+ or by an individual or Legal Entity authorized to submit on behalf of
54
+ the copyright owner. For the purposes of this definition, "submitted"
55
+ means any form of electronic, verbal, or written communication sent
56
+ to the Licensor or its representatives, including but not limited to
57
+ communication on electronic mailing lists, source code control systems,
58
+ and issue tracking systems that are managed by, or on behalf of, the
59
+ Licensor for the purpose of discussing and improving the Work, but
60
+ excluding communication that is conspicuously marked or otherwise
61
+ designated in writing by the copyright owner as "Not a Contribution."
62
+
63
+ "Contributor" shall mean Licensor and any individual or Legal Entity
64
+ on behalf of whom a Contribution has been received by Licensor and
65
+ subsequently incorporated within the Work.
66
+
67
+ 2. Grant of Copyright License. Subject to the terms and conditions of
68
+ this License, each Contributor hereby grants to You a perpetual,
69
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70
+ copyright license to reproduce, prepare Derivative Works of,
71
+ publicly display, publicly perform, sublicense, and distribute the
72
+ Work and such Derivative Works in Source or Object form.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, patent, trademark, and
103
+ attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright statement to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Additional Liability. While redistributing
167
+ the Work or Derivative Works thereof, You may choose to offer,
168
+ and charge a fee for, acceptance of support, warranty, indemnity,
169
+ or other liability obligations and/or rights consistent with this
170
+ License. However, in accepting such obligations, You may act only
171
+ on Your own behalf and on Your sole responsibility, not on behalf
172
+ of any other Contributor, and only if You agree to indemnify,
173
+ defend, and hold each Contributor harmless for any liability
174
+ incurred by, or claims asserted against, such Contributor by reason
175
+ of your accepting any such warranty or additional liability.
176
+
177
+ END OF TERMS AND CONDITIONS
178
+
179
+ APPENDIX: How to apply the Apache License to your work.
180
+
181
+ To apply the Apache License to your work, attach the following
182
+ boilerplate notice, with the fields enclosed by brackets "[]"
183
+ replaced with your own identifying information. (Don't include
184
+ the brackets!) The text should be enclosed in the appropriate
185
+ comment syntax for the file format. We also recommend that a
186
+ file or class name and description of purpose be included on the
187
+ same "printed page" as the copyright notice for easier
188
+ identification within third-party archives.
189
+
190
+ Copyright [yyyy] [name of copyright owner]
191
+
192
+ Licensed under the Apache License, Version 2.0 (the "License");
193
+ you may not use this file except in compliance with the License.
194
+ You may obtain a copy of the License at
195
+
196
+ http://www.apache.org/licenses/LICENSE-2.0
197
+
198
+ Unless required by applicable law or agreed to in writing, software
199
+ distributed under the License is distributed on an "AS IS" BASIS,
200
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201
+ See the License for the specific language governing permissions and
202
+ limitations under the License.
app.py ADDED
@@ -0,0 +1,241 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ #
3
+ # Copyright 2022-2023 Xiaomi Corp. (authors: Fangjun Kuang)
4
+ #
5
+ # See LICENSE for clarification regarding multiple authors
6
+ #
7
+ # Licensed under the Apache License, Version 2.0 (the "License");
8
+ # you may not use this file except in compliance with the License.
9
+ # You may obtain a copy of the License at
10
+ #
11
+ # http://www.apache.org/licenses/LICENSE-2.0
12
+ #
13
+ # Unless required by applicable law or agreed to in writing, software
14
+ # distributed under the License is distributed on an "AS IS" BASIS,
15
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
16
+ # See the License for the specific language governing permissions and
17
+ # limitations under the License.
18
+
19
+ # References:
20
+ # https://gradio.app/docs/#dropdown
21
+
22
+ import logging
23
+ import os
24
+ import time
25
+ import uuid
26
+
27
+ import gradio as gr
28
+ import soundfile as sf
29
+
30
+ from model import get_pretrained_model, language_to_models
31
+
32
+ title = "# Next-gen Kaldi: Text-to-speech (TTS)"
33
+
34
+ description = """
35
+ This space shows how to convert text to speech with Next-gen Kaldi.
36
+
37
+ It is running on CPU within a docker container provided by Hugging Face.
38
+
39
+ See more information by visiting the following links:
40
+
41
+ - <https://github.com/k2-fsa/sherpa-onnx>
42
+
43
+ If you want to deploy it locally, please see
44
+ <https://k2-fsa.github.io/sherpa/>
45
+
46
+ If you want to use Android APKs, please see
47
+ <https://k2-fsa.github.io/sherpa/onnx/tts/apk.html>
48
+
49
+ If you want to use Android text-to-speech engine APKs, please see
50
+ <https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html>
51
+
52
+ If you want to download an all-in-one exe for Windows, please see
53
+ <https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models>
54
+
55
+ """
56
+
57
+ # css style is copied from
58
+ # https://huggingface.co/spaces/alphacep/asr/blob/main/app.py#L113
59
+ css = """
60
+ .result {display:flex;flex-direction:column}
61
+ .result_item {padding:15px;margin-bottom:8px;border-radius:15px;width:100%}
62
+ .result_item_success {background-color:mediumaquamarine;color:white;align-self:start}
63
+ .result_item_error {background-color:#ff7070;color:white;align-self:start}
64
+ """
65
+
66
+ examples = [
67
+ ["Min-nan (闽南话)", "csukuangfj/vits-mms-nan", "ài piaǸ chiah ē iaN̂", 0, 1.0],
68
+ ["Thai", "csukuangfj/vits-mms-tha", "ฉันรักคุณ", 0, 1.0],
69
+ ]
70
+
71
+
72
+ def update_model_dropdown(language: str):
73
+ if language in language_to_models:
74
+ choices = language_to_models[language]
75
+ return gr.Dropdown(
76
+ choices=choices,
77
+ value=choices[0],
78
+ interactive=True,
79
+ )
80
+
81
+ raise ValueError(f"Unsupported language: {language}")
82
+
83
+
84
+ def build_html_output(s: str, style: str = "result_item_success"):
85
+ return f"""
86
+ <div class='result'>
87
+ <div class='result_item {style}'>
88
+ {s}
89
+ </div>
90
+ </div>
91
+ """
92
+
93
+
94
+ def process(language: str, repo_id: str, text: str, sid: str, speed: float):
95
+ logging.info(f"Input text: {text}. sid: {sid}, speed: {speed}")
96
+ sid = int(sid)
97
+ tts = get_pretrained_model(repo_id, speed)
98
+
99
+ start = time.time()
100
+ audio = tts.generate(text, sid=sid)
101
+ end = time.time()
102
+
103
+ if len(audio.samples) == 0:
104
+ raise ValueError(
105
+ "Error in generating audios. Please read previous error messages."
106
+ )
107
+
108
+ duration = len(audio.samples) / audio.sample_rate
109
+
110
+ elapsed_seconds = end - start
111
+ rtf = elapsed_seconds / duration
112
+
113
+ info = f"""
114
+ Wave duration : {duration:.3f} s <br/>
115
+ Processing time: {elapsed_seconds:.3f} s <br/>
116
+ RTF: {elapsed_seconds:.3f}/{duration:.3f} = {rtf:.3f} <br/>
117
+ """
118
+
119
+ logging.info(info)
120
+ logging.info(f"\nrepo_id: {repo_id}\ntext: {text}\nsid: {sid}\nspeed: {speed}")
121
+
122
+ filename = str(uuid.uuid4())
123
+ filename = f"{filename}.wav"
124
+ sf.write(
125
+ filename,
126
+ audio.samples,
127
+ samplerate=audio.sample_rate,
128
+ subtype="PCM_16",
129
+ )
130
+
131
+ return filename, build_html_output(info)
132
+
133
+
134
+ demo = gr.Blocks(css=css)
135
+
136
+
137
+ with demo:
138
+ gr.Markdown(title)
139
+ language_choices = list(language_to_models.keys())
140
+
141
+ language_radio = gr.Radio(
142
+ label="Language",
143
+ choices=language_choices,
144
+ value=language_choices[0],
145
+ )
146
+
147
+ model_dropdown = gr.Dropdown(
148
+ choices=language_to_models[language_choices[0]],
149
+ label="Select a model",
150
+ value=language_to_models[language_choices[0]][0],
151
+ )
152
+
153
+ language_radio.change(
154
+ update_model_dropdown,
155
+ inputs=language_radio,
156
+ outputs=model_dropdown,
157
+ )
158
+
159
+ with gr.Tabs():
160
+ with gr.TabItem("Please input your text"):
161
+ input_text = gr.Textbox(
162
+ label="Input text",
163
+ info="Your text",
164
+ lines=3,
165
+ placeholder="Please input your text here",
166
+ )
167
+
168
+ input_sid = gr.Textbox(
169
+ label="Speaker ID",
170
+ info="Speaker ID",
171
+ lines=1,
172
+ max_lines=1,
173
+ value="0",
174
+ placeholder="Speaker ID. Valid only for mult-speaker model",
175
+ )
176
+
177
+ input_speed = gr.Slider(
178
+ minimum=0.1,
179
+ maximum=10,
180
+ value=1,
181
+ step=0.1,
182
+ label="Speed (larger->faster; smaller->slower)",
183
+ )
184
+
185
+ input_button = gr.Button("Submit")
186
+
187
+ output_audio = gr.Audio(label="Output")
188
+
189
+ output_info = gr.HTML(label="Info")
190
+
191
+ gr.Examples(
192
+ examples=examples,
193
+ fn=process,
194
+ inputs=[
195
+ language_radio,
196
+ model_dropdown,
197
+ input_text,
198
+ input_sid,
199
+ input_speed,
200
+ ],
201
+ outputs=[
202
+ output_audio,
203
+ output_info,
204
+ ],
205
+ )
206
+
207
+ input_button.click(
208
+ process,
209
+ inputs=[
210
+ language_radio,
211
+ model_dropdown,
212
+ input_text,
213
+ input_sid,
214
+ input_speed,
215
+ ],
216
+ outputs=[
217
+ output_audio,
218
+ output_info,
219
+ ],
220
+ )
221
+
222
+ gr.Markdown(description)
223
+
224
+
225
+ def download_espeak_ng_data():
226
+ os.system(
227
+ """
228
+ cd /tmp
229
+ wget -qq https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
230
+ tar xf espeak-ng-data.tar.bz2
231
+ """
232
+ )
233
+
234
+
235
+ if __name__ == "__main__":
236
+ download_espeak_ng_data()
237
+ formatter = "%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
238
+
239
+ logging.basicConfig(format=formatter, level=logging.INFO)
240
+
241
+ demo.launch()
model.py ADDED
@@ -0,0 +1,795 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright 2022-2023 Xiaomi Corp. (authors: Fangjun Kuang)
2
+ #
3
+ # See LICENSE for clarification regarding multiple authors
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+
17
+ import os
18
+ from functools import lru_cache
19
+ from pathlib import Path
20
+
21
+ import sherpa_onnx
22
+ from huggingface_hub import hf_hub_download
23
+
24
+
25
+ def get_file(
26
+ repo_id: str,
27
+ filename: str,
28
+ subfolder: str = ".",
29
+ ) -> str:
30
+ model_filename = hf_hub_download(
31
+ repo_id=repo_id,
32
+ filename=filename,
33
+ subfolder=subfolder,
34
+ )
35
+ return model_filename
36
+
37
+
38
+ @lru_cache(maxsize=10)
39
+ def _get_vits_vctk(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
40
+ assert repo_id == "csukuangfj/vits-vctk"
41
+
42
+ model = get_file(
43
+ repo_id=repo_id,
44
+ filename="vits-vctk.onnx",
45
+ subfolder=".",
46
+ )
47
+
48
+ lexicon = get_file(
49
+ repo_id=repo_id,
50
+ filename="lexicon.txt",
51
+ subfolder=".",
52
+ )
53
+
54
+ tokens = get_file(
55
+ repo_id=repo_id,
56
+ filename="tokens.txt",
57
+ subfolder=".",
58
+ )
59
+
60
+ tts_config = sherpa_onnx.OfflineTtsConfig(
61
+ model=sherpa_onnx.OfflineTtsModelConfig(
62
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
63
+ model=model,
64
+ lexicon=lexicon,
65
+ tokens=tokens,
66
+ length_scale=1.0 / speed,
67
+ ),
68
+ provider="cpu",
69
+ debug=True,
70
+ num_threads=2,
71
+ )
72
+ )
73
+ tts = sherpa_onnx.OfflineTts(tts_config)
74
+
75
+ return tts
76
+
77
+
78
+ @lru_cache(maxsize=10)
79
+ def _get_vits_ljs(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
80
+ assert repo_id == "csukuangfj/vits-ljs"
81
+
82
+ model = get_file(
83
+ repo_id=repo_id,
84
+ filename="vits-ljs.onnx",
85
+ subfolder=".",
86
+ )
87
+
88
+ lexicon = get_file(
89
+ repo_id=repo_id,
90
+ filename="lexicon.txt",
91
+ subfolder=".",
92
+ )
93
+
94
+ tokens = get_file(
95
+ repo_id=repo_id,
96
+ filename="tokens.txt",
97
+ subfolder=".",
98
+ )
99
+
100
+ tts_config = sherpa_onnx.OfflineTtsConfig(
101
+ model=sherpa_onnx.OfflineTtsModelConfig(
102
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
103
+ model=model,
104
+ lexicon=lexicon,
105
+ tokens=tokens,
106
+ length_scale=1.0 / speed,
107
+ ),
108
+ provider="cpu",
109
+ debug=True,
110
+ num_threads=2,
111
+ )
112
+ )
113
+ tts = sherpa_onnx.OfflineTts(tts_config)
114
+
115
+ return tts
116
+
117
+
118
+ @lru_cache(maxsize=10)
119
+ def _get_vits_piper(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
120
+ data_dir = "/tmp/espeak-ng-data"
121
+ if "coqui" in repo_id or "vits-mms" in repo_id:
122
+ name = "model"
123
+ elif "piper" in repo_id:
124
+ n = len("vits-piper-")
125
+ name = repo_id.split("/")[1][n:]
126
+ elif "mimic3" in repo_id:
127
+ n = len("vits-mimic3-")
128
+ name = repo_id.split("/")[1][n:]
129
+ else:
130
+ raise ValueError(f"Unsupported {repo_id}")
131
+
132
+ if "vits-coqui-uk-mai" in repo_id or "vits-mms" in repo_id:
133
+ data_dir = ""
134
+
135
+ model = get_file(
136
+ repo_id=repo_id,
137
+ filename=f"{name}.onnx",
138
+ subfolder=".",
139
+ )
140
+
141
+ tokens = get_file(
142
+ repo_id=repo_id,
143
+ filename="tokens.txt",
144
+ subfolder=".",
145
+ )
146
+
147
+ tts_config = sherpa_onnx.OfflineTtsConfig(
148
+ model=sherpa_onnx.OfflineTtsModelConfig(
149
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
150
+ model=model,
151
+ lexicon="",
152
+ data_dir=data_dir,
153
+ tokens=tokens,
154
+ length_scale=1.0 / speed,
155
+ ),
156
+ provider="cpu",
157
+ debug=True,
158
+ num_threads=2,
159
+ )
160
+ )
161
+ tts = sherpa_onnx.OfflineTts(tts_config)
162
+
163
+ return tts
164
+
165
+
166
+ @lru_cache(maxsize=10)
167
+ def _get_vits_mms(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
168
+ return _get_vits_piper(repo_id, speed)
169
+
170
+
171
+ @lru_cache(maxsize=10)
172
+ def _get_vits_zh_aishell3(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
173
+ assert repo_id == "csukuangfj/vits-zh-aishell3"
174
+
175
+ model = get_file(
176
+ repo_id=repo_id,
177
+ filename="vits-aishell3.onnx",
178
+ subfolder=".",
179
+ )
180
+
181
+ lexicon = get_file(
182
+ repo_id=repo_id,
183
+ filename="lexicon.txt",
184
+ subfolder=".",
185
+ )
186
+
187
+ tokens = get_file(
188
+ repo_id=repo_id,
189
+ filename="tokens.txt",
190
+ subfolder=".",
191
+ )
192
+
193
+ rule_fsts = ["phone.fst", "date.fst", "number.fst", "new_heteronym.fst"]
194
+
195
+ rule_fsts = [
196
+ get_file(
197
+ repo_id=repo_id,
198
+ filename=f,
199
+ subfolder=".",
200
+ )
201
+ for f in rule_fsts
202
+ ]
203
+ rule_fsts = ",".join(rule_fsts)
204
+
205
+ rule_fars = get_file(
206
+ repo_id=repo_id,
207
+ filename="rule.far",
208
+ subfolder=".",
209
+ )
210
+
211
+ tts_config = sherpa_onnx.OfflineTtsConfig(
212
+ model=sherpa_onnx.OfflineTtsModelConfig(
213
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
214
+ model=model,
215
+ lexicon=lexicon,
216
+ tokens=tokens,
217
+ length_scale=1.0 / speed,
218
+ ),
219
+ provider="cpu",
220
+ debug=True,
221
+ num_threads=2,
222
+ ),
223
+ rule_fsts=rule_fsts,
224
+ rule_fars=rule_fars,
225
+ )
226
+ tts = sherpa_onnx.OfflineTts(tts_config)
227
+
228
+ return tts
229
+
230
+
231
+ @lru_cache(maxsize=10)
232
+ def _get_vits_hf(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
233
+ repo_id = repo_id.split("|")[0]
234
+
235
+ if "fanchen" in repo_id or "vits-cantonese-hf-xiaomaiiwn" in repo_id:
236
+ model = repo_id.split("/")[-1]
237
+ else:
238
+ model = repo_id.split("-")[-1]
239
+
240
+ if not Path("/tmp/dict").is_dir():
241
+ os.system(
242
+ "cd /tmp; curl -SL -O https://github.com/csukuangfj/cppjieba/releases/download/sherpa-onnx-2024-04-19/dict.tar.bz2; tar xvf dict.tar.bz2"
243
+ )
244
+ os.system("ls -lh /tmp/dict")
245
+
246
+ model = get_file(
247
+ repo_id=repo_id,
248
+ filename=f"{model}.onnx",
249
+ subfolder=".",
250
+ )
251
+
252
+ lexicon = get_file(
253
+ repo_id=repo_id,
254
+ filename="lexicon.txt",
255
+ subfolder=".",
256
+ )
257
+
258
+ tokens = get_file(
259
+ repo_id=repo_id,
260
+ filename="tokens.txt",
261
+ subfolder=".",
262
+ )
263
+
264
+ rule_fars = ""
265
+
266
+ if "vits-cantonese-hf-xiaomaiiwn" not in repo_id:
267
+ rule_fsts = ["phone.fst", "date.fst", "number.fst", "new_heteronym.fst"]
268
+
269
+ rule_fsts = [
270
+ get_file(
271
+ repo_id=repo_id,
272
+ filename=f,
273
+ subfolder=".",
274
+ )
275
+ for f in rule_fsts
276
+ ]
277
+ rule_fsts = ",".join(rule_fsts)
278
+
279
+ # rule_fars = get_file(
280
+ # repo_id=repo_id,
281
+ # filename="rule.far",
282
+ # subfolder=".",
283
+ # )
284
+ vits_dict_dir = "/tmp/dict"
285
+ else:
286
+ rule_fsts = get_file(
287
+ repo_id=repo_id,
288
+ filename="rule.fst",
289
+ subfolder=".",
290
+ )
291
+ vits_dict_dir = ""
292
+
293
+ tts_config = sherpa_onnx.OfflineTtsConfig(
294
+ model=sherpa_onnx.OfflineTtsModelConfig(
295
+ vits=sherpa_onnx.OfflineTtsVitsModelConfig(
296
+ model=model,
297
+ lexicon=lexicon,
298
+ tokens=tokens,
299
+ dict_dir=vits_dict_dir,
300
+ length_scale=1.0 / speed,
301
+ ),
302
+ provider="cpu",
303
+ debug=True,
304
+ num_threads=2,
305
+ ),
306
+ rule_fsts=rule_fsts,
307
+ rule_fars=rule_fars,
308
+ )
309
+ tts = sherpa_onnx.OfflineTts(tts_config)
310
+
311
+ return tts
312
+
313
+
314
+ @lru_cache(maxsize=10)
315
+ def get_pretrained_model(repo_id: str, speed: float) -> sherpa_onnx.OfflineTts:
316
+ if repo_id in chinese_models:
317
+ return chinese_models[repo_id](repo_id, speed)
318
+ if repo_id in cantonese_models:
319
+ return cantonese_models[repo_id](repo_id, speed)
320
+ elif repo_id in english_models:
321
+ return english_models[repo_id](repo_id, speed)
322
+ elif repo_id in german_models:
323
+ return german_models[repo_id](repo_id, speed)
324
+ elif repo_id in spanish_models:
325
+ return spanish_models[repo_id](repo_id, speed)
326
+ elif repo_id in french_models:
327
+ return french_models[repo_id](repo_id, speed)
328
+ elif repo_id in ukrainian_models:
329
+ return ukrainian_models[repo_id](repo_id, speed)
330
+ elif repo_id in russian_models:
331
+ return russian_models[repo_id](repo_id, speed)
332
+ elif repo_id in arabic_models:
333
+ return arabic_models[repo_id](repo_id, speed)
334
+ elif repo_id in catalan_models:
335
+ return catalan_models[repo_id](repo_id, speed)
336
+ elif repo_id in czech_models:
337
+ return czech_models[repo_id](repo_id, speed)
338
+ elif repo_id in danish_models:
339
+ return danish_models[repo_id](repo_id, speed)
340
+ elif repo_id in greek_models:
341
+ return greek_models[repo_id](repo_id, speed)
342
+ elif repo_id in finnish_models:
343
+ return finnish_models[repo_id](repo_id, speed)
344
+ elif repo_id in hungarian_models:
345
+ return hungarian_models[repo_id](repo_id, speed)
346
+ elif repo_id in icelandic_models:
347
+ return icelandic_models[repo_id](repo_id, speed)
348
+ elif repo_id in italian_models:
349
+ return italian_models[repo_id](repo_id, speed)
350
+ elif repo_id in georgian_models:
351
+ return georgian_models[repo_id](repo_id, speed)
352
+ elif repo_id in kazakh_models:
353
+ return kazakh_models[repo_id](repo_id, speed)
354
+ elif repo_id in luxembourgish_models:
355
+ return luxembourgish_models[repo_id](repo_id, speed)
356
+ elif repo_id in nepali_models:
357
+ return nepali_models[repo_id](repo_id, speed)
358
+ elif repo_id in dutch_models:
359
+ return dutch_models[repo_id](repo_id, speed)
360
+ elif repo_id in norwegian_models:
361
+ return norwegian_models[repo_id](repo_id, speed)
362
+ elif repo_id in polish_models:
363
+ return polish_models[repo_id](repo_id, speed)
364
+ elif repo_id in portuguese_models:
365
+ return portuguese_models[repo_id](repo_id, speed)
366
+ elif repo_id in romanian_models:
367
+ return romanian_models[repo_id](repo_id, speed)
368
+ elif repo_id in slovak_models:
369
+ return slovak_models[repo_id](repo_id, speed)
370
+ elif repo_id in serbian_models:
371
+ return serbian_models[repo_id](repo_id, speed)
372
+ elif repo_id in swedish_models:
373
+ return swedish_models[repo_id](repo_id, speed)
374
+ elif repo_id in swahili_models:
375
+ return swahili_models[repo_id](repo_id, speed)
376
+ elif repo_id in turkish_models:
377
+ return turkish_models[repo_id](repo_id, speed)
378
+ elif repo_id in vietnamese_models:
379
+ return vietnamese_models[repo_id](repo_id, speed)
380
+ elif repo_id in bulgarian_models:
381
+ return bulgarian_models[repo_id](repo_id, speed)
382
+ elif repo_id in estonian_models:
383
+ return estonian_models[repo_id](repo_id, speed)
384
+ elif repo_id in irish_models:
385
+ return irish_models[repo_id](repo_id, speed)
386
+ elif repo_id in croatian_models:
387
+ return croatian_models[repo_id](repo_id, speed)
388
+ elif repo_id in lithuanian_models:
389
+ return lithuanian_models[repo_id](repo_id, speed)
390
+ elif repo_id in latvian_models:
391
+ return latvian_models[repo_id](repo_id, speed)
392
+ elif repo_id in maltese_models:
393
+ return maltese_models[repo_id](repo_id, speed)
394
+ elif repo_id in slovenian_models:
395
+ return slovenian_models[repo_id](repo_id, speed)
396
+ elif repo_id in bengali_models:
397
+ return bengali_models[repo_id](repo_id, speed)
398
+ elif repo_id in min_nan_models:
399
+ return min_nan_models[repo_id](repo_id, speed)
400
+ elif repo_id in thai_models:
401
+ return thai_models[repo_id](repo_id, speed)
402
+ elif repo_id in persian_models:
403
+ return persian_models[repo_id](repo_id, speed)
404
+ elif repo_id in korean_models:
405
+ return korean_models[repo_id](repo_id, speed)
406
+ elif repo_id in afrikaans_models:
407
+ return afrikaans_models[repo_id](repo_id, speed)
408
+ elif repo_id in gujarati_models:
409
+ return gujarati_models[repo_id](repo_id, speed)
410
+ elif repo_id in tswana_models:
411
+ return tswana_models[repo_id](repo_id, speed)
412
+ else:
413
+ raise ValueError(f"Unsupported repo_id: {repo_id}")
414
+
415
+
416
+ cantonese_models = {
417
+ "csukuangfj/vits-cantonese-hf-xiaomaiiwn": _get_vits_hf,
418
+ }
419
+
420
+ chinese_models = {
421
+ "csukuangfj/vits-zh-hf-keqing|804": _get_vits_hf, # 804
422
+ "csukuangfj/vits-zh-hf-theresa|804": _get_vits_hf, # 804
423
+ "csukuangfj/vits-zh-hf-eula|804": _get_vits_hf, # 804
424
+ "csukuangfj/vits-zh-hf-echo|804": _get_vits_hf, # 804
425
+ "csukuangfj/vits-zh-hf-bronya|804": _get_vits_hf, # 804
426
+ "csukuangfj/vits-zh-hf-doom|804": _get_vits_hf, # 804
427
+ "csukuangfj/vits-zh-hf-zenyatta|804": _get_vits_hf, # 804
428
+ "csukuangfj/vits-zh-hf-abyssinvoker|804": _get_vits_hf, # 804
429
+ "csukuangfj/vits-zh-hf-fanchen-wnj|1": _get_vits_hf, # 1
430
+ "csukuangfj/vits-zh-hf-fanchen-C|187": _get_vits_hf, # 187
431
+ "csukuangfj/vits-zh-hf-fanchen-ZhiHuiLaoZhe|1": _get_vits_hf, # 1
432
+ "csukuangfj/vits-zh-hf-fanchen-ZhiHuiLaoZhe_new|1": _get_vits_hf, # 1
433
+ "csukuangfj/vits-zh-hf-fanchen-unity|1": _get_vits_hf, # 1
434
+ "csukuangfj/vits-zh-aishell3": _get_vits_zh_aishell3,
435
+ "csukuangfj/vits-piper-zh_CN-huayan-medium": _get_vits_piper,
436
+ # "csukuangfj/vits-piper-zh_CN-huayan-x_low": _get_vits_piper,
437
+ }
438
+
439
+ english_models = {
440
+ "csukuangfj/vits-piper-en_US-glados": _get_vits_piper,
441
+ # coqui-ai
442
+ "csukuangfj/vits-coqui-en-ljspeech": _get_vits_piper,
443
+ "csukuangfj/vits-coqui-en-ljspeech-neon": _get_vits_piper,
444
+ "csukuangfj/vits-coqui-en-vctk": _get_vits_piper,
445
+ # piper, US
446
+ "csukuangfj/vits-piper-en_GB-sweetbbak-amy": _get_vits_piper,
447
+ "csukuangfj/vits-piper-en_US-amy-low": _get_vits_piper,
448
+ "csukuangfj/vits-piper-en_US-amy-medium": _get_vits_piper,
449
+ "csukuangfj/vits-piper-en_US-arctic-medium": _get_vits_piper, # 18 speakers
450
+ "csukuangfj/vits-piper-en_US-danny-low": _get_vits_piper,
451
+ "csukuangfj/vits-piper-en_US-hfc_male-medium": _get_vits_piper,
452
+ "csukuangfj/vits-piper-en_US-joe-medium": _get_vits_piper,
453
+ "csukuangfj/vits-piper-en_US-kathleen-low": _get_vits_piper,
454
+ "csukuangfj/vits-piper-en_US-kusal-medium": _get_vits_piper,
455
+ "csukuangfj/vits-piper-en_US-l2arctic-medium": _get_vits_piper, # 24 speakers
456
+ "csukuangfj/vits-piper-en_US-lessac-high": _get_vits_piper,
457
+ "csukuangfj/vits-piper-en_US-lessac-low": _get_vits_piper,
458
+ "csukuangfj/vits-piper-en_US-lessac-medium": _get_vits_piper,
459
+ "csukuangfj/vits-piper-en_US-libritts-high": _get_vits_piper, # 904 speakers
460
+ "csukuangfj/vits-piper-en_US-libritts_r-medium": _get_vits_piper, # 904 speakers
461
+ "csukuangfj/vits-piper-en_US-ljspeech-high": _get_vits_piper,
462
+ "csukuangfj/vits-piper-en_US-ljspeech-medium": _get_vits_piper,
463
+ "csukuangfj/vits-piper-en_US-ryan-high": _get_vits_piper,
464
+ "csukuangfj/vits-piper-en_US-ryan-low": _get_vits_piper,
465
+ "csukuangfj/vits-piper-en_US-ryan-medium": _get_vits_piper,
466
+ # piper, GB
467
+ "csukuangfj/vits-piper-en_GB-alan-low": _get_vits_piper,
468
+ "csukuangfj/vits-piper-en_GB-alan-medium": _get_vits_piper,
469
+ "csukuangfj/vits-piper-en_GB-alan-medium": _get_vits_piper,
470
+ "csukuangfj/vits-piper-en_GB-cori-high": _get_vits_piper,
471
+ "csukuangfj/vits-piper-en_GB-cori-medium": _get_vits_piper,
472
+ "csukuangfj/vits-piper-en_GB-jenny_dioco-medium": _get_vits_piper,
473
+ "csukuangfj/vits-piper-en_GB-northern_english_male-medium": _get_vits_piper,
474
+ "csukuangfj/vits-piper-en_GB-semaine-medium": _get_vits_piper,
475
+ "csukuangfj/vits-piper-en_GB-southern_english_female-low": _get_vits_piper,
476
+ "csukuangfj/vits-piper-en_GB-vctk-medium": _get_vits_piper,
477
+ #
478
+ "csukuangfj/vits-vctk": _get_vits_vctk, # 109 speakers
479
+ "csukuangfj/vits-ljs": _get_vits_ljs,
480
+ }
481
+
482
+ german_models = {
483
+ "csukuangfj/vits-coqui-de-css10": _get_vits_piper,
484
+ "csukuangfj/vits-piper-de_DE-eva_k-x_low": _get_vits_piper,
485
+ "csukuangfj/vits-piper-de_DE-karlsson-low": _get_vits_piper,
486
+ "csukuangfj/vits-piper-de_DE-kerstin-low": _get_vits_piper,
487
+ "csukuangfj/vits-piper-de_DE-mls-medium": _get_vits_piper,
488
+ "csukuangfj/vits-piper-de_DE-pavoque-low": _get_vits_piper,
489
+ "csukuangfj/vits-piper-de_DE-ramona-low": _get_vits_piper,
490
+ "csukuangfj/vits-piper-de_DE-thorsten-low": _get_vits_piper,
491
+ "csukuangfj/vits-piper-de_DE-thorsten-medium": _get_vits_piper,
492
+ "csukuangfj/vits-piper-de_DE-thorsten-high": _get_vits_piper,
493
+ "csukuangfj/vits-piper-de_DE-thorsten_emotional-medium": _get_vits_piper, # 8 speakers
494
+ }
495
+
496
+ spanish_models = {
497
+ # "csukuangfj/vits-coqui-es-css10": _get_vits_piper,
498
+ "csukuangfj/vits-piper-es-glados-medium": _get_vits_piper,
499
+ "csukuangfj/vits-piper-es_ES-carlfm-x_low": _get_vits_piper,
500
+ "csukuangfj/vits-piper-es_ES-davefx-medium": _get_vits_piper,
501
+ # "csukuangfj/vits-piper-es_ES-mls_10246-low": _get_vits_piper,
502
+ # "csukuangfj/vits-piper-es_ES-mls_9972-low": _get_vits_piper,
503
+ "csukuangfj/vits-piper-es_ES-sharvard-medium": _get_vits_piper, # 2 speakers
504
+ "csukuangfj/vits-piper-es_MX-ald-medium": _get_vits_piper,
505
+ "csukuangfj/vits-piper-es_MX-claude-high": _get_vits_piper,
506
+ "csukuangfj/vits-mimic3-es_ES-m-ailabs_low": _get_vits_piper,
507
+ }
508
+
509
+ french_models = {
510
+ "csukuangfj/vits-coqui-fr-css10": _get_vits_piper,
511
+ # "csukuangfj/vits-piper-fr_FR-gilles-low": _get_vits_piper,
512
+ # "csukuangfj/vits-piper-fr_FR-mls_1840-low": _get_vits_piper,
513
+ "csukuangfj/vits-piper-fr_FR-mls-medium": _get_vits_piper, # 2 speakers, 0-femal, 1-male
514
+ "csukuangfj/vits-piper-fr_FR-upmc-medium": _get_vits_piper, # 2 speakers, 0-femal, 1-male
515
+ "csukuangfj/vits-piper-fr_FR-siwis-low": _get_vits_piper, # female
516
+ "csukuangfj/vits-piper-fr_FR-siwis-medium": _get_vits_piper,
517
+ "csukuangfj/vits-piper-fr_FR-tjiho-model1": _get_vits_piper,
518
+ "csukuangfj/vits-piper-fr_FR-tjiho-model2": _get_vits_piper,
519
+ "csukuangfj/vits-piper-fr_FR-tjiho-model3": _get_vits_piper,
520
+ }
521
+
522
+ ukrainian_models = {
523
+ "csukuangfj/vits-piper-uk_UA-lada-x_low": _get_vits_piper,
524
+ "csukuangfj/vits-coqui-uk-mai": _get_vits_piper,
525
+ # "csukuangfj/vits-piper-uk_UA-ukrainian_tts-medium": _get_vits_piper, # does not work somehow
526
+ }
527
+
528
+ russian_models = {
529
+ "csukuangfj/vits-piper-ru_RU-denis-medium": _get_vits_piper,
530
+ "csukuangfj/vits-piper-ru_RU-dmitri-medium": _get_vits_piper,
531
+ "csukuangfj/vits-piper-ru_RU-irina-medium": _get_vits_piper,
532
+ "csukuangfj/vits-piper-ru_RU-ruslan-medium": _get_vits_piper,
533
+ }
534
+
535
+ arabic_models = {
536
+ "csukuangfj/vits-piper-ar_JO-kareem-low": _get_vits_piper,
537
+ "csukuangfj/vits-piper-ar_JO-kareem-medium": _get_vits_piper,
538
+ }
539
+
540
+ catalan_models = {
541
+ "csukuangfj/vits-piper-ca_ES-upc_ona-x_low": _get_vits_piper,
542
+ "csukuangfj/vits-piper-ca_ES-upc_ona-medium": _get_vits_piper,
543
+ "csukuangfj/vits-piper-ca_ES-upc_pau-x_low": _get_vits_piper,
544
+ }
545
+
546
+ czech_models = {
547
+ "csukuangfj/vits-piper-cs_CZ-jirka-low": _get_vits_piper,
548
+ "csukuangfj/vits-piper-cs_CZ-jirka-medium": _get_vits_piper,
549
+ "csukuangfj/vits-coqui-cs-cv": _get_vits_piper,
550
+ }
551
+
552
+ danish_models = {
553
+ "csukuangfj/vits-coqui-da-cv": _get_vits_piper,
554
+ "csukuangfj/vits-piper-da_DK-talesyntese-medium": _get_vits_piper,
555
+ }
556
+
557
+ greek_models = {
558
+ "csukuangfj/vits-piper-el_GR-rapunzelina-low": _get_vits_piper,
559
+ # "csukuangfj/vits-mimic3-el_GR-rapunzelina_low": _get_vits_piper,
560
+ }
561
+
562
+ finnish_models = {
563
+ "csukuangfj/vits-coqui-fi-css10": _get_vits_piper,
564
+ "csukuangfj/vits-piper-fi_FI-harri-low": _get_vits_piper,
565
+ "csukuangfj/vits-piper-fi_FI-harri-medium": _get_vits_piper,
566
+ "csukuangfj/vits-mimic3-fi_FI-harri-tapani-ylilammi_low": _get_vits_piper,
567
+ }
568
+
569
+ hungarian_models = {
570
+ # "csukuangfj/vits-coqui-hu-css10": _get_vits_piper,
571
+ "csukuangfj/vits-piper-hu_HU-anna-medium": _get_vits_piper,
572
+ "csukuangfj/vits-piper-hu_HU-berta-medium": _get_vits_piper,
573
+ "csukuangfj/vits-piper-hu_HU-imre-medium": _get_vits_piper,
574
+ "csukuangfj/vits-mimic3-hu_HU-diana-majlinger_low": _get_vits_piper,
575
+ }
576
+
577
+ icelandic_models = {
578
+ "csukuangfj/vits-piper-is_IS-bui-medium": _get_vits_piper,
579
+ "csukuangfj/vits-piper-is_IS-salka-medium": _get_vits_piper,
580
+ "csukuangfj/vits-piper-is_IS-steinn-medium": _get_vits_piper,
581
+ "csukuangfj/vits-piper-is_IS-ugla-medium": _get_vits_piper,
582
+ }
583
+
584
+ italian_models = {
585
+ "csukuangfj/vits-piper-it_IT-riccardo-x_low": _get_vits_piper,
586
+ }
587
+
588
+ georgian_models = {
589
+ "csukuangfj/vits-piper-ka_GE-natia-medium": _get_vits_piper,
590
+ }
591
+
592
+ kazakh_models = {
593
+ "csukuangfj/vits-piper-kk_KZ-iseke-x_low": _get_vits_piper,
594
+ "csukuangfj/vits-piper-kk_KZ-issai-high": _get_vits_piper,
595
+ "csukuangfj/vits-piper-kk_KZ-raya-x_low": _get_vits_piper,
596
+ }
597
+
598
+ luxembourgish_models = {
599
+ "csukuangfj/vits-piper-lb_LU-marylux-medium": _get_vits_piper,
600
+ }
601
+
602
+ nepali_models = {
603
+ "csukuangfj/vits-piper-ne_NP-google-medium": _get_vits_piper,
604
+ "csukuangfj/vits-piper-ne_NP-google-x_low": _get_vits_piper,
605
+ "csukuangfj/vits-mimic3-ne_NP-ne-google_low": _get_vits_piper,
606
+ }
607
+
608
+ dutch_models = {
609
+ "csukuangfj/vits-coqui-nl-css10": _get_vits_piper,
610
+ "csukuangfj/vits-piper-nl_BE-nathalie-medium": _get_vits_piper,
611
+ "csukuangfj/vits-piper-nl_BE-nathalie-x_low": _get_vits_piper,
612
+ "csukuangfj/vits-piper-nl_BE-rdh-medium": _get_vits_piper,
613
+ "csukuangfj/vits-piper-nl_BE-rdh-x_low": _get_vits_piper,
614
+ "csukuangfj/vits-piper-nl_NL-mls-medium": _get_vits_piper,
615
+ "csukuangfj/vits-piper-nl_NL-mls_5809-low": _get_vits_piper,
616
+ "csukuangfj/vits-piper-nl_NL-mls_7432-low": _get_vits_piper,
617
+ }
618
+
619
+ norwegian_models = {
620
+ "csukuangfj/vits-piper-no_NO-talesyntese-medium": _get_vits_piper,
621
+ }
622
+
623
+ polish_models = {
624
+ "csukuangfj/vits-coqui-pl-mai_female": _get_vits_piper,
625
+ "csukuangfj/vits-piper-pl_PL-darkman-medium": _get_vits_piper,
626
+ "csukuangfj/vits-piper-pl_PL-gosia-medium": _get_vits_piper,
627
+ "csukuangfj/vits-piper-pl_PL-mc_speech-medium": _get_vits_piper,
628
+ # "csukuangfj/vits-piper-pl_PL-mls_6892-low": _get_vits_piper,
629
+ "csukuangfj/vits-mimic3-pl_PL-m-ailabs_low": _get_vits_piper,
630
+ }
631
+
632
+ portuguese_models = {
633
+ "csukuangfj/vits-coqui-pt-cv": _get_vits_piper,
634
+ "csukuangfj/vits-piper-pt_BR-edresson-low": _get_vits_piper,
635
+ "csukuangfj/vits-piper-pt_BR-faber-medium": _get_vits_piper,
636
+ "csukuangfj/vits-piper-pt_PT-tugao-medium": _get_vits_piper,
637
+ }
638
+
639
+ romanian_models = {
640
+ "csukuangfj/vits-coqui-ro-cv": _get_vits_piper,
641
+ "csukuangfj/vits-piper-ro_RO-mihai-medium": _get_vits_piper,
642
+ }
643
+
644
+
645
+ slovak_models = {
646
+ "csukuangfj/vits-coqui-sk-cv": _get_vits_piper,
647
+ "csukuangfj/vits-piper-sk_SK-lili-medium": _get_vits_piper,
648
+ }
649
+
650
+ serbian_models = {
651
+ "csukuangfj/vits-piper-sr_RS-serbski_institut-medium": _get_vits_piper,
652
+ }
653
+
654
+ swedish_models = {
655
+ "csukuangfj/vits-coqui-sv-cv": _get_vits_piper,
656
+ "csukuangfj/vits-piper-sv_SE-nst-medium": _get_vits_piper,
657
+ }
658
+
659
+ swahili_models = {
660
+ "csukuangfj/vits-piper-sw_CD-lanfrica-medium": _get_vits_piper,
661
+ }
662
+
663
+ turkish_models = {
664
+ "csukuangfj/vits-piper-tr_TR-dfki-medium": _get_vits_piper,
665
+ "csukuangfj/vits-piper-tr_TR-fahrettin-medium": _get_vits_piper,
666
+ }
667
+
668
+ vietnamese_models = {
669
+ "csukuangfj/vits-piper-vi_VN-25hours_single-low": _get_vits_piper,
670
+ "csukuangfj/vits-piper-vi_VN-vais1000-medium": _get_vits_piper,
671
+ "csukuangfj/vits-piper-vi_VN-vivos-x_low": _get_vits_piper,
672
+ "csukuangfj/vits-mimic3-vi_VN-vais1000_low": _get_vits_piper,
673
+ }
674
+
675
+ bulgarian_models = {
676
+ "csukuangfj/vits-coqui-bg-cv": _get_vits_piper,
677
+ }
678
+
679
+ estonian_models = {
680
+ "csukuangfj/vits-coqui-et-cv": _get_vits_piper,
681
+ }
682
+
683
+ irish_models = {
684
+ "csukuangfj/vits-coqui-ga-cv": _get_vits_piper,
685
+ }
686
+
687
+ croatian_models = {
688
+ "csukuangfj/vits-coqui-hr-cv": _get_vits_piper,
689
+ }
690
+
691
+ lithuanian_models = {
692
+ "csukuangfj/vits-coqui-lt-cv": _get_vits_piper,
693
+ }
694
+
695
+ latvian_models = {
696
+ "csukuangfj/vits-coqui-lv-cv": _get_vits_piper,
697
+ }
698
+
699
+ maltese_models = {
700
+ "csukuangfj/vits-coqui-mt-cv": _get_vits_piper,
701
+ }
702
+
703
+ slovenian_models = {
704
+ "csukuangfj/vits-piper-sl_SI-artur-medium": _get_vits_piper,
705
+ "csukuangfj/vits-coqui-sl-cv": _get_vits_piper,
706
+ }
707
+
708
+ # Bangla
709
+ bengali_models = {
710
+ "csukuangfj/vits-coqui-bn-custom_female": _get_vits_piper,
711
+ "csukuangfj/vits-mimic3-bn-multi_low": _get_vits_piper,
712
+ }
713
+
714
+ min_nan_models = {
715
+ "csukuangfj/vits-mms-nan": _get_vits_mms,
716
+ }
717
+
718
+ thai_models = {
719
+ "csukuangfj/vits-mms-tha": _get_vits_mms,
720
+ }
721
+
722
+ persian_models = {
723
+ "csukuangfj/vits-piper-fa_IR-amir-medium": _get_vits_piper,
724
+ "csukuangfj/vits-piper-fa_IR-gyro-medium": _get_vits_piper,
725
+ "csukuangfj/vits-mimic3-fa-haaniye_low": _get_vits_piper,
726
+ }
727
+
728
+ korean_models = {
729
+ "csukuangfj/vits-mimic3-ko_KO-kss_low": _get_vits_piper,
730
+ }
731
+
732
+
733
+ afrikaans_models = {
734
+ "csukuangfj/vits-mimic3-af_ZA-google-nwu_low": _get_vits_piper,
735
+ }
736
+
737
+ gujarati_models = {
738
+ "csukuangfj/vits-mimic3-gu_IN-cmu-indic_low": _get_vits_piper,
739
+ }
740
+
741
+ tswana_models = {
742
+ "csukuangfj/vits-mimic3-tn_ZA-google-nwu_low": _get_vits_piper,
743
+ }
744
+
745
+
746
+ language_to_models = {
747
+ "English": list(english_models.keys()),
748
+ "Chinese (Mandarin, 普通话)": list(chinese_models.keys()),
749
+ "Cantonese (粤语)": list(cantonese_models.keys()),
750
+ "Min-nan (闽南话)": list(min_nan_models.keys()),
751
+ "Arabic": list(arabic_models.keys()),
752
+ "Afrikaans": list(afrikaans_models.keys()),
753
+ "Bengali": list(bengali_models.keys()),
754
+ "Bulgarian": list(bulgarian_models.keys()),
755
+ "Catalan": list(catalan_models.keys()),
756
+ "Croatian": list(croatian_models.keys()),
757
+ "Czech": list(czech_models.keys()),
758
+ "Danish": list(danish_models.keys()),
759
+ "Dutch": list(dutch_models.keys()),
760
+ "Estonian": list(estonian_models.keys()),
761
+ "Finnish": list(finnish_models.keys()),
762
+ "French": list(french_models.keys()),
763
+ "Georgian": list(georgian_models.keys()),
764
+ "German": list(german_models.keys()),
765
+ "Greek": list(greek_models.keys()),
766
+ "Gujarati": list(gujarati_models.keys()),
767
+ "Hungarian": list(hungarian_models.keys()),
768
+ "Icelandic": list(icelandic_models.keys()),
769
+ "Irish": list(irish_models.keys()),
770
+ "Italian": list(italian_models.keys()),
771
+ "Kazakh": list(kazakh_models.keys()),
772
+ "Korean": list(korean_models.keys()),
773
+ "Latvian": list(latvian_models.keys()),
774
+ "Lithuanian": list(lithuanian_models.keys()),
775
+ "Luxembourgish": list(luxembourgish_models.keys()),
776
+ "Maltese": list(maltese_models.keys()),
777
+ "Nepali": list(nepali_models.keys()),
778
+ "Norwegian": list(norwegian_models.keys()),
779
+ "Persian": list(persian_models.keys()),
780
+ "Polish": list(polish_models.keys()),
781
+ "Portuguese": list(portuguese_models.keys()),
782
+ "Romanian": list(romanian_models.keys()),
783
+ "Russian": list(russian_models.keys()),
784
+ "Serbian": list(serbian_models.keys()),
785
+ "Slovak": list(slovak_models.keys()),
786
+ "Slovenian": list(slovenian_models.keys()),
787
+ "Spanish": list(spanish_models.keys()),
788
+ "Swahili": list(swahili_models.keys()),
789
+ "Swedish": list(swedish_models.keys()),
790
+ "Thai": list(thai_models.keys()),
791
+ "Tswana": list(tswana_models.keys()),
792
+ "Turkish": list(turkish_models.keys()),
793
+ "Ukrainian": list(ukrainian_models.keys()),
794
+ "Vietnamese": list(vietnamese_models.keys()),
795
+ }
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ https://huggingface.co/csukuangfj/sherpa-onnx-wheels/resolve/main/sherpa_onnx-1.9.22-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
2
+ #sherpa-onnx
3
+
4
+ soundfile