Clebersla commited on
Commit
919858d
·
1 Parent(s): 4d5d913

Upload 12 files

Browse files
Files changed (12) hide show
  1. README.md +51 -6
  2. app.py +2086 -0
  3. config.py +204 -0
  4. gitattributes.txt +35 -0
  5. gitignore.txt +12 -0
  6. i18n.py +28 -0
  7. packages.txt +3 -0
  8. requirements.txt +22 -0
  9. rmvpe.py +432 -0
  10. run.sh +16 -0
  11. utils.py +151 -0
  12. vc_infer_pipeline.py +646 -0
README.md CHANGED
@@ -1,12 +1,57 @@
1
  ---
2
- title: RVC V2 Huggingface Version
3
- emoji: 🐠
4
- colorFrom: indigo
5
- colorTo: gray
6
  sdk: gradio
7
- sdk_version: 3.44.4
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Magic Vocals
3
+ emoji: 🦀
4
+ colorFrom: red
5
+ colorTo: pink
6
  sdk: gradio
7
+ sdk_version: 3.42.0
8
  app_file: app.py
9
  pinned: false
10
+ license: lgpl-3.0
11
  ---
12
 
13
+ ## 🔧 Pre-requisites
14
+
15
+ Before running the project, you must have the following tool installed on your machine:
16
+ * [Python v3.8.0](https://www.python.org/downloads/release/python-380/)
17
+
18
+ Also, you will need to clone the repository:
19
+
20
+ ```bash
21
+ # Clone the repository
22
+ git clone https://huggingface.co/spaces/mateuseap/magic-vocals/
23
+ # Enter in the root directory
24
+ cd magic-vocals
25
+ ```
26
+
27
+ ## 🚀 How to run
28
+
29
+ After you've cloned the repository and entered in the root directory, run the following commands:
30
+
31
+ ```bash
32
+ # Create and activate a Virtual Environment (make sure you're using Python v3.8.0 to do it)
33
+ python -m venv venv
34
+ . venv/bin/activate
35
+
36
+ # Change mode and execute a shell script to configure and run the application
37
+ chmod +x run.sh
38
+ ./run.sh
39
+ ```
40
+
41
+ After the shell script executes everything, the application will be running at http://127.0.0.1:7860! Open up the link in a browser to use the app:
42
+
43
+ ![Magic Vocals](https://i.imgur.com/V55oKv8.png)
44
+
45
+ **You only need to execute the `run.sh` one time**, once you've executed it one time, you just need to activate the virtual environment and run the command below to start the app again:
46
+
47
+ ```bash
48
+ python app.py
49
+ ```
50
+
51
+ **THE `run.sh` IS SUPPORTED BY THE FOLLOWING OPERATING SYSTEMS:**
52
+
53
+
54
+ | OS | Supported |
55
+ |-----------|:---------:|
56
+ | `Windows` | ❌ |
57
+ | `Ubuntu` | ✅ |
app.py ADDED
@@ -0,0 +1,2086 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import subprocess, torch, os, traceback, sys, warnings, shutil, numpy as np
2
+ from mega import Mega
3
+ os.environ["no_proxy"] = "localhost, 127.0.0.1, ::1"
4
+ import threading
5
+ from time import sleep
6
+ from subprocess import Popen
7
+ import faiss
8
+ from random import shuffle
9
+ import json, datetime, requests
10
+ from gtts import gTTS
11
+ now_dir = os.getcwd()
12
+ sys.path.append(now_dir)
13
+ tmp = os.path.join(now_dir, "TEMP")
14
+ shutil.rmtree(tmp, ignore_errors=True)
15
+ shutil.rmtree("%s/runtime/Lib/site-packages/infer_pack" % (now_dir), ignore_errors=True)
16
+ os.makedirs(tmp, exist_ok=True)
17
+ os.makedirs(os.path.join(now_dir, "logs"), exist_ok=True)
18
+ os.makedirs(os.path.join(now_dir, "weights"), exist_ok=True)
19
+ os.environ["TEMP"] = tmp
20
+ warnings.filterwarnings("ignore")
21
+ torch.manual_seed(114514)
22
+ from i18n import I18nAuto
23
+
24
+ import signal
25
+
26
+ import math
27
+
28
+ from utils import load_audio, CSVutil
29
+
30
+ global DoFormant, Quefrency, Timbre
31
+
32
+ if not os.path.isdir('csvdb/'):
33
+ os.makedirs('csvdb')
34
+ frmnt, stp = open("csvdb/formanting.csv", 'w'), open("csvdb/stop.csv", 'w')
35
+ frmnt.close()
36
+ stp.close()
37
+
38
+ try:
39
+ DoFormant, Quefrency, Timbre = CSVutil('csvdb/formanting.csv', 'r', 'formanting')
40
+ DoFormant = (
41
+ lambda DoFormant: True if DoFormant.lower() == 'true' else (False if DoFormant.lower() == 'false' else DoFormant)
42
+ )(DoFormant)
43
+ except (ValueError, TypeError, IndexError):
44
+ DoFormant, Quefrency, Timbre = False, 1.0, 1.0
45
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, Quefrency, Timbre)
46
+
47
+ def download_models():
48
+ # Download hubert base model if not present
49
+ if not os.path.isfile('./hubert_base.pt'):
50
+ response = requests.get('https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/main/hubert_base.pt')
51
+
52
+ if response.status_code == 200:
53
+ with open('./hubert_base.pt', 'wb') as f:
54
+ f.write(response.content)
55
+ print("Downloaded hubert base model file successfully. File saved to ./hubert_base.pt.")
56
+ else:
57
+ raise Exception("Failed to download hubert base model file. Status code: " + str(response.status_code) + ".")
58
+
59
+ # Download rmvpe model if not present
60
+ if not os.path.isfile('./rmvpe.pt'):
61
+ response = requests.get('https://drive.usercontent.google.com/download?id=1Hkn4kNuVFRCNQwyxQFRtmzmMBGpQxptI&export=download&authuser=0&confirm=t&uuid=0b3a40de-465b-4c65-8c41-135b0b45c3f7&at=APZUnTV3lA3LnyTbeuduura6Dmi2:1693724254058')
62
+
63
+ if response.status_code == 200:
64
+ with open('./rmvpe.pt', 'wb') as f:
65
+ f.write(response.content)
66
+ print("Downloaded rmvpe model file successfully. File saved to ./rmvpe.pt.")
67
+ else:
68
+ raise Exception("Failed to download rmvpe model file. Status code: " + str(response.status_code) + ".")
69
+
70
+ download_models()
71
+
72
+ print("\n-------------------------------\nRVC v2 Easy GUI (Local Edition)\n-------------------------------\n")
73
+
74
+ def formant_apply(qfrency, tmbre):
75
+ Quefrency = qfrency
76
+ Timbre = tmbre
77
+ DoFormant = True
78
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, qfrency, tmbre)
79
+
80
+ return ({"value": Quefrency, "__type__": "update"}, {"value": Timbre, "__type__": "update"})
81
+
82
+ def get_fshift_presets():
83
+ fshift_presets_list = []
84
+ for dirpath, _, filenames in os.walk("./formantshiftcfg/"):
85
+ for filename in filenames:
86
+ if filename.endswith(".txt"):
87
+ fshift_presets_list.append(os.path.join(dirpath,filename).replace('\\','/'))
88
+
89
+ if len(fshift_presets_list) > 0:
90
+ return fshift_presets_list
91
+ else:
92
+ return ''
93
+
94
+
95
+
96
+ def formant_enabled(cbox, qfrency, tmbre, frmntapply, formantpreset, formant_refresh_button):
97
+
98
+ if (cbox):
99
+
100
+ DoFormant = True
101
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, qfrency, tmbre)
102
+ #print(f"is checked? - {cbox}\ngot {DoFormant}")
103
+
104
+ return (
105
+ {"value": True, "__type__": "update"},
106
+ {"visible": True, "__type__": "update"},
107
+ {"visible": True, "__type__": "update"},
108
+ {"visible": True, "__type__": "update"},
109
+ {"visible": True, "__type__": "update"},
110
+ {"visible": True, "__type__": "update"},
111
+ )
112
+
113
+
114
+ else:
115
+
116
+ DoFormant = False
117
+ CSVutil('csvdb/formanting.csv', 'w+', 'formanting', DoFormant, qfrency, tmbre)
118
+
119
+ #print(f"is checked? - {cbox}\ngot {DoFormant}")
120
+ return (
121
+ {"value": False, "__type__": "update"},
122
+ {"visible": False, "__type__": "update"},
123
+ {"visible": False, "__type__": "update"},
124
+ {"visible": False, "__type__": "update"},
125
+ {"visible": False, "__type__": "update"},
126
+ {"visible": False, "__type__": "update"},
127
+ {"visible": False, "__type__": "update"},
128
+ )
129
+
130
+
131
+
132
+ def preset_apply(preset, qfer, tmbr):
133
+ if str(preset) != '':
134
+ with open(str(preset), 'r') as p:
135
+ content = p.readlines()
136
+ qfer, tmbr = content[0].split('\n')[0], content[1]
137
+
138
+ formant_apply(qfer, tmbr)
139
+ else:
140
+ pass
141
+ return ({"value": qfer, "__type__": "update"}, {"value": tmbr, "__type__": "update"})
142
+
143
+ def update_fshift_presets(preset, qfrency, tmbre):
144
+
145
+ qfrency, tmbre = preset_apply(preset, qfrency, tmbre)
146
+
147
+ if (str(preset) != ''):
148
+ with open(str(preset), 'r') as p:
149
+ content = p.readlines()
150
+ qfrency, tmbre = content[0].split('\n')[0], content[1]
151
+
152
+ formant_apply(qfrency, tmbre)
153
+ else:
154
+ pass
155
+ return (
156
+ {"choices": get_fshift_presets(), "__type__": "update"},
157
+ {"value": qfrency, "__type__": "update"},
158
+ {"value": tmbre, "__type__": "update"},
159
+ )
160
+
161
+ i18n = I18nAuto()
162
+ #i18n.print()
163
+ # 判断是否有能用来训练和加速推理的N卡
164
+ ngpu = torch.cuda.device_count()
165
+ gpu_infos = []
166
+ mem = []
167
+ if (not torch.cuda.is_available()) or ngpu == 0:
168
+ if_gpu_ok = False
169
+ else:
170
+ if_gpu_ok = False
171
+ for i in range(ngpu):
172
+ gpu_name = torch.cuda.get_device_name(i)
173
+ if (
174
+ "10" in gpu_name
175
+ or "16" in gpu_name
176
+ or "20" in gpu_name
177
+ or "30" in gpu_name
178
+ or "40" in gpu_name
179
+ or "A2" in gpu_name.upper()
180
+ or "A3" in gpu_name.upper()
181
+ or "A4" in gpu_name.upper()
182
+ or "P4" in gpu_name.upper()
183
+ or "A50" in gpu_name.upper()
184
+ or "A60" in gpu_name.upper()
185
+ or "70" in gpu_name
186
+ or "80" in gpu_name
187
+ or "90" in gpu_name
188
+ or "M4" in gpu_name.upper()
189
+ or "T4" in gpu_name.upper()
190
+ or "TITAN" in gpu_name.upper()
191
+ ): # A10#A100#V100#A40#P40#M40#K80#A4500
192
+ if_gpu_ok = True # 至少有一张能用的N卡
193
+ gpu_infos.append("%s\t%s" % (i, gpu_name))
194
+ mem.append(
195
+ int(
196
+ torch.cuda.get_device_properties(i).total_memory
197
+ / 1024
198
+ / 1024
199
+ / 1024
200
+ + 0.4
201
+ )
202
+ )
203
+ if if_gpu_ok == True and len(gpu_infos) > 0:
204
+ gpu_info = "\n".join(gpu_infos)
205
+ default_batch_size = min(mem) // 2
206
+ else:
207
+ gpu_info = i18n("很遗憾您这没有能用的显卡来支持您训练")
208
+ default_batch_size = 1
209
+ gpus = "-".join([i[0] for i in gpu_infos])
210
+ from lib.infer_pack.models import (
211
+ SynthesizerTrnMs256NSFsid,
212
+ SynthesizerTrnMs256NSFsid_nono,
213
+ SynthesizerTrnMs768NSFsid,
214
+ SynthesizerTrnMs768NSFsid_nono,
215
+ )
216
+ import soundfile as sf
217
+ from fairseq import checkpoint_utils
218
+ import gradio as gr
219
+ import logging
220
+ from vc_infer_pipeline import VC
221
+ from config import Config
222
+
223
+ config = Config()
224
+ # from trainset_preprocess_pipeline import PreProcess
225
+ logging.getLogger("numba").setLevel(logging.WARNING)
226
+
227
+ hubert_model = None
228
+
229
+ def load_hubert():
230
+ global hubert_model
231
+ models, _, _ = checkpoint_utils.load_model_ensemble_and_task(
232
+ ["hubert_base.pt"],
233
+ suffix="",
234
+ )
235
+ hubert_model = models[0]
236
+ hubert_model = hubert_model.to(config.device)
237
+ if config.is_half:
238
+ hubert_model = hubert_model.half()
239
+ else:
240
+ hubert_model = hubert_model.float()
241
+ hubert_model.eval()
242
+
243
+
244
+ weight_root = "weights"
245
+ index_root = "logs"
246
+ names = []
247
+ for name in os.listdir(weight_root):
248
+ if name.endswith(".pth"):
249
+ names.append(name)
250
+ index_paths = []
251
+ for root, dirs, files in os.walk(index_root, topdown=False):
252
+ for name in files:
253
+ if name.endswith(".index") and "trained" not in name:
254
+ index_paths.append("%s/%s" % (root, name))
255
+
256
+
257
+
258
+ def vc_single(
259
+ sid,
260
+ input_audio_path,
261
+ f0_up_key,
262
+ f0_file,
263
+ f0_method,
264
+ file_index,
265
+ #file_index2,
266
+ # file_big_npy,
267
+ index_rate,
268
+ filter_radius,
269
+ resample_sr,
270
+ rms_mix_rate,
271
+ protect,
272
+ crepe_hop_length,
273
+ ): # spk_item, input_audio0, vc_transform0,f0_file,f0method0
274
+ global tgt_sr, net_g, vc, hubert_model, version
275
+ if input_audio_path is None:
276
+ return "You need to upload an audio", None
277
+ f0_up_key = int(f0_up_key)
278
+ try:
279
+ audio = load_audio(input_audio_path, 16000, DoFormant, Quefrency, Timbre)
280
+ audio_max = np.abs(audio).max() / 0.95
281
+ if audio_max > 1:
282
+ audio /= audio_max
283
+ times = [0, 0, 0]
284
+ if hubert_model == None:
285
+ load_hubert()
286
+ if_f0 = cpt.get("f0", 1)
287
+ file_index = (
288
+ (
289
+ file_index.strip(" ")
290
+ .strip('"')
291
+ .strip("\n")
292
+ .strip('"')
293
+ .strip(" ")
294
+ .replace("trained", "added")
295
+ )
296
+ ) # 防止小白写错,自动帮他替换掉
297
+ # file_big_npy = (
298
+ # file_big_npy.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
299
+ # )
300
+ audio_opt = vc.pipeline(
301
+ hubert_model,
302
+ net_g,
303
+ sid,
304
+ audio,
305
+ input_audio_path,
306
+ times,
307
+ f0_up_key,
308
+ f0_method,
309
+ file_index,
310
+ # file_big_npy,
311
+ index_rate,
312
+ if_f0,
313
+ filter_radius,
314
+ tgt_sr,
315
+ resample_sr,
316
+ rms_mix_rate,
317
+ version,
318
+ protect,
319
+ crepe_hop_length,
320
+ f0_file=f0_file,
321
+ )
322
+ if resample_sr >= 16000 and tgt_sr != resample_sr:
323
+ tgt_sr = resample_sr
324
+ index_info = (
325
+ "Using index:%s." % file_index
326
+ if os.path.exists(file_index)
327
+ else "Index not used."
328
+ )
329
+ return "Success.\n %s\nTime:\n npy:%ss, f0:%ss, infer:%ss" % (
330
+ index_info,
331
+ times[0],
332
+ times[1],
333
+ times[2],
334
+ ), (tgt_sr, audio_opt)
335
+ except:
336
+ info = traceback.format_exc()
337
+ print(info)
338
+ return info, (None, None)
339
+
340
+
341
+ def vc_multi(
342
+ sid,
343
+ dir_path,
344
+ opt_root,
345
+ paths,
346
+ f0_up_key,
347
+ f0_method,
348
+ file_index,
349
+ file_index2,
350
+ # file_big_npy,
351
+ index_rate,
352
+ filter_radius,
353
+ resample_sr,
354
+ rms_mix_rate,
355
+ protect,
356
+ format1,
357
+ crepe_hop_length,
358
+ ):
359
+ try:
360
+ dir_path = (
361
+ dir_path.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
362
+ ) # 防止小白拷路径头尾带了空格和"和回车
363
+ opt_root = opt_root.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
364
+ os.makedirs(opt_root, exist_ok=True)
365
+ try:
366
+ if dir_path != "":
367
+ paths = [os.path.join(dir_path, name) for name in os.listdir(dir_path)]
368
+ else:
369
+ paths = [path.name for path in paths]
370
+ except:
371
+ traceback.print_exc()
372
+ paths = [path.name for path in paths]
373
+ infos = []
374
+ for path in paths:
375
+ info, opt = vc_single(
376
+ sid,
377
+ path,
378
+ f0_up_key,
379
+ None,
380
+ f0_method,
381
+ file_index,
382
+ # file_big_npy,
383
+ index_rate,
384
+ filter_radius,
385
+ resample_sr,
386
+ rms_mix_rate,
387
+ protect,
388
+ crepe_hop_length
389
+ )
390
+ if "Success" in info:
391
+ try:
392
+ tgt_sr, audio_opt = opt
393
+ if format1 in ["wav", "flac"]:
394
+ sf.write(
395
+ "%s/%s.%s" % (opt_root, os.path.basename(path), format1),
396
+ audio_opt,
397
+ tgt_sr,
398
+ )
399
+ else:
400
+ path = "%s/%s.wav" % (opt_root, os.path.basename(path))
401
+ sf.write(
402
+ path,
403
+ audio_opt,
404
+ tgt_sr,
405
+ )
406
+ if os.path.exists(path):
407
+ os.system(
408
+ "ffmpeg -i %s -vn %s -q:a 2 -y"
409
+ % (path, path[:-4] + ".%s" % format1)
410
+ )
411
+ except:
412
+ info += traceback.format_exc()
413
+ infos.append("%s->%s" % (os.path.basename(path), info))
414
+ yield "\n".join(infos)
415
+ yield "\n".join(infos)
416
+ except:
417
+ yield traceback.format_exc()
418
+
419
+ # 一个选项卡全局只能有一个音色
420
+ def get_vc(sid):
421
+ global n_spk, tgt_sr, net_g, vc, cpt, version
422
+ if sid == "" or sid == []:
423
+ global hubert_model
424
+ if hubert_model != None: # 考虑到轮询, 需要加个判断看是否 sid 是由有模型切换到无模型的
425
+ print("clean_empty_cache")
426
+ del net_g, n_spk, vc, hubert_model, tgt_sr # ,cpt
427
+ hubert_model = net_g = n_spk = vc = hubert_model = tgt_sr = None
428
+ if torch.cuda.is_available():
429
+ torch.cuda.empty_cache()
430
+ ###楼下不这么折腾清理不干净
431
+ if_f0 = cpt.get("f0", 1)
432
+ version = cpt.get("version", "v1")
433
+ if version == "v1":
434
+ if if_f0 == 1:
435
+ net_g = SynthesizerTrnMs256NSFsid(
436
+ *cpt["config"], is_half=config.is_half
437
+ )
438
+ else:
439
+ net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
440
+ elif version == "v2":
441
+ if if_f0 == 1:
442
+ net_g = SynthesizerTrnMs768NSFsid(
443
+ *cpt["config"], is_half=config.is_half
444
+ )
445
+ else:
446
+ net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
447
+ del net_g, cpt
448
+ if torch.cuda.is_available():
449
+ torch.cuda.empty_cache()
450
+ cpt = None
451
+ return {"visible": False, "__type__": "update"}
452
+ person = "%s/%s" % (weight_root, sid)
453
+ print("loading %s" % person)
454
+ cpt = torch.load(person, map_location="cpu")
455
+ tgt_sr = cpt["config"][-1]
456
+ cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0] # n_spk
457
+ if_f0 = cpt.get("f0", 1)
458
+ version = cpt.get("version", "v1")
459
+ if version == "v1":
460
+ if if_f0 == 1:
461
+ net_g = SynthesizerTrnMs256NSFsid(*cpt["config"], is_half=config.is_half)
462
+ else:
463
+ net_g = SynthesizerTrnMs256NSFsid_nono(*cpt["config"])
464
+ elif version == "v2":
465
+ if if_f0 == 1:
466
+ net_g = SynthesizerTrnMs768NSFsid(*cpt["config"], is_half=config.is_half)
467
+ else:
468
+ net_g = SynthesizerTrnMs768NSFsid_nono(*cpt["config"])
469
+ del net_g.enc_q
470
+ print(net_g.load_state_dict(cpt["weight"], strict=False))
471
+ net_g.eval().to(config.device)
472
+ if config.is_half:
473
+ net_g = net_g.half()
474
+ else:
475
+ net_g = net_g.float()
476
+ vc = VC(tgt_sr, config)
477
+ n_spk = cpt["config"][-3]
478
+ return {"visible": False, "maximum": n_spk, "__type__": "update"}
479
+
480
+
481
+ def change_choices():
482
+ names = []
483
+ for name in os.listdir(weight_root):
484
+ if name.endswith(".pth"):
485
+ names.append(name)
486
+ index_paths = []
487
+ for root, dirs, files in os.walk(index_root, topdown=False):
488
+ for name in files:
489
+ if name.endswith(".index") and "trained" not in name:
490
+ index_paths.append("%s/%s" % (root, name))
491
+ return {"choices": sorted(names), "__type__": "update"}, {
492
+ "choices": sorted(index_paths),
493
+ "__type__": "update",
494
+ }
495
+
496
+
497
+ def clean():
498
+ return {"value": "", "__type__": "update"}
499
+
500
+
501
+ sr_dict = {
502
+ "32k": 32000,
503
+ "40k": 40000,
504
+ "48k": 48000,
505
+ }
506
+
507
+
508
+ def if_done(done, p):
509
+ while 1:
510
+ if p.poll() == None:
511
+ sleep(0.5)
512
+ else:
513
+ break
514
+ done[0] = True
515
+
516
+
517
+ def if_done_multi(done, ps):
518
+ while 1:
519
+ # poll==None代表进程未结束
520
+ # 只要有一个进程未结束都不停
521
+ flag = 1
522
+ for p in ps:
523
+ if p.poll() == None:
524
+ flag = 0
525
+ sleep(0.5)
526
+ break
527
+ if flag == 1:
528
+ break
529
+ done[0] = True
530
+
531
+
532
+ def preprocess_dataset(trainset_dir, exp_dir, sr, n_p):
533
+ sr = sr_dict[sr]
534
+ os.makedirs("%s/logs/%s" % (now_dir, exp_dir), exist_ok=True)
535
+ f = open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "w")
536
+ f.close()
537
+ cmd = (
538
+ config.python_cmd
539
+ + " trainset_preprocess_pipeline_print.py %s %s %s %s/logs/%s "
540
+ % (trainset_dir, sr, n_p, now_dir, exp_dir)
541
+ + str(config.noparallel)
542
+ )
543
+ print(cmd)
544
+ p = Popen(cmd, shell=True) # , stdin=PIPE, stdout=PIPE,stderr=PIPE,cwd=now_dir
545
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
546
+ done = [False]
547
+ threading.Thread(
548
+ target=if_done,
549
+ args=(
550
+ done,
551
+ p,
552
+ ),
553
+ ).start()
554
+ while 1:
555
+ with open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "r") as f:
556
+ yield (f.read())
557
+ sleep(1)
558
+ if done[0] == True:
559
+ break
560
+ with open("%s/logs/%s/preprocess.log" % (now_dir, exp_dir), "r") as f:
561
+ log = f.read()
562
+ print(log)
563
+ yield log
564
+
565
+ # but2.click(extract_f0,[gpus6,np7,f0method8,if_f0_3,trainset_dir4],[info2])
566
+ def extract_f0_feature(gpus, n_p, f0method, if_f0, exp_dir, version19, echl):
567
+ gpus = gpus.split("-")
568
+ os.makedirs("%s/logs/%s" % (now_dir, exp_dir), exist_ok=True)
569
+ f = open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "w")
570
+ f.close()
571
+ if if_f0:
572
+ cmd = config.python_cmd + " extract_f0_print.py %s/logs/%s %s %s %s" % (
573
+ now_dir,
574
+ exp_dir,
575
+ n_p,
576
+ f0method,
577
+ echl,
578
+ )
579
+ print(cmd)
580
+ p = Popen(cmd, shell=True, cwd=now_dir) # , stdin=PIPE, stdout=PIPE,stderr=PIPE
581
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
582
+ done = [False]
583
+ threading.Thread(
584
+ target=if_done,
585
+ args=(
586
+ done,
587
+ p,
588
+ ),
589
+ ).start()
590
+ while 1:
591
+ with open(
592
+ "%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r"
593
+ ) as f:
594
+ yield (f.read())
595
+ sleep(1)
596
+ if done[0] == True:
597
+ break
598
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
599
+ log = f.read()
600
+ print(log)
601
+ yield log
602
+ ####对不同part分别开多进程
603
+ """
604
+ n_part=int(sys.argv[1])
605
+ i_part=int(sys.argv[2])
606
+ i_gpu=sys.argv[3]
607
+ exp_dir=sys.argv[4]
608
+ os.environ["CUDA_VISIBLE_DEVICES"]=str(i_gpu)
609
+ """
610
+ leng = len(gpus)
611
+ ps = []
612
+ for idx, n_g in enumerate(gpus):
613
+ cmd = (
614
+ config.python_cmd
615
+ + " extract_feature_print.py %s %s %s %s %s/logs/%s %s"
616
+ % (
617
+ config.device,
618
+ leng,
619
+ idx,
620
+ n_g,
621
+ now_dir,
622
+ exp_dir,
623
+ version19,
624
+ )
625
+ )
626
+ print(cmd)
627
+ p = Popen(
628
+ cmd, shell=True, cwd=now_dir
629
+ ) # , shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=now_dir
630
+ ps.append(p)
631
+ ###煞笔gr, popen read都非得全跑完了再一次性读取, 不用gr就正常读一句输出一句;只能额外弄出一个文本流定时读
632
+ done = [False]
633
+ threading.Thread(
634
+ target=if_done_multi,
635
+ args=(
636
+ done,
637
+ ps,
638
+ ),
639
+ ).start()
640
+ while 1:
641
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
642
+ yield (f.read())
643
+ sleep(1)
644
+ if done[0] == True:
645
+ break
646
+ with open("%s/logs/%s/extract_f0_feature.log" % (now_dir, exp_dir), "r") as f:
647
+ log = f.read()
648
+ print(log)
649
+ yield log
650
+
651
+
652
+ def change_sr2(sr2, if_f0_3, version19):
653
+ path_str = "" if version19 == "v1" else "_v2"
654
+ f0_str = "f0" if if_f0_3 else ""
655
+ if_pretrained_generator_exist = os.access("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), os.F_OK)
656
+ if_pretrained_discriminator_exist = os.access("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), os.F_OK)
657
+ if (if_pretrained_generator_exist == False):
658
+ print("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
659
+ if (if_pretrained_discriminator_exist == False):
660
+ print("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
661
+ return (
662
+ ("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_generator_exist else "",
663
+ ("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_discriminator_exist else "",
664
+ {"visible": True, "__type__": "update"}
665
+ )
666
+
667
+ def change_version19(sr2, if_f0_3, version19):
668
+ path_str = "" if version19 == "v1" else "_v2"
669
+ f0_str = "f0" if if_f0_3 else ""
670
+ if_pretrained_generator_exist = os.access("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), os.F_OK)
671
+ if_pretrained_discriminator_exist = os.access("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), os.F_OK)
672
+ if (if_pretrained_generator_exist == False):
673
+ print("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
674
+ if (if_pretrained_discriminator_exist == False):
675
+ print("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2), "not exist, will not use pretrained model")
676
+ return (
677
+ ("pretrained%s/%sG%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_generator_exist else "",
678
+ ("pretrained%s/%sD%s.pth" % (path_str, f0_str, sr2)) if if_pretrained_discriminator_exist else "",
679
+ )
680
+
681
+
682
+ def change_f0(if_f0_3, sr2, version19): # f0method8,pretrained_G14,pretrained_D15
683
+ path_str = "" if version19 == "v1" else "_v2"
684
+ if_pretrained_generator_exist = os.access("pretrained%s/f0G%s.pth" % (path_str, sr2), os.F_OK)
685
+ if_pretrained_discriminator_exist = os.access("pretrained%s/f0D%s.pth" % (path_str, sr2), os.F_OK)
686
+ if (if_pretrained_generator_exist == False):
687
+ print("pretrained%s/f0G%s.pth" % (path_str, sr2), "not exist, will not use pretrained model")
688
+ if (if_pretrained_discriminator_exist == False):
689
+ print("pretrained%s/f0D%s.pth" % (path_str, sr2), "not exist, will not use pretrained model")
690
+ if if_f0_3:
691
+ return (
692
+ {"visible": True, "__type__": "update"},
693
+ "pretrained%s/f0G%s.pth" % (path_str, sr2) if if_pretrained_generator_exist else "",
694
+ "pretrained%s/f0D%s.pth" % (path_str, sr2) if if_pretrained_discriminator_exist else "",
695
+ )
696
+ return (
697
+ {"visible": False, "__type__": "update"},
698
+ ("pretrained%s/G%s.pth" % (path_str, sr2)) if if_pretrained_generator_exist else "",
699
+ ("pretrained%s/D%s.pth" % (path_str, sr2)) if if_pretrained_discriminator_exist else "",
700
+ )
701
+
702
+
703
+ global log_interval
704
+
705
+
706
+ def set_log_interval(exp_dir, batch_size12):
707
+ log_interval = 1
708
+
709
+ folder_path = os.path.join(exp_dir, "1_16k_wavs")
710
+
711
+ if os.path.exists(folder_path) and os.path.isdir(folder_path):
712
+ wav_files = [f for f in os.listdir(folder_path) if f.endswith(".wav")]
713
+ if wav_files:
714
+ sample_size = len(wav_files)
715
+ log_interval = math.ceil(sample_size / batch_size12)
716
+ if log_interval > 1:
717
+ log_interval += 1
718
+ return log_interval
719
+
720
+ # but3.click(click_train,[exp_dir1,sr2,if_f0_3,save_epoch10,total_epoch11,batch_size12,if_save_latest13,pretrained_G14,pretrained_D15,gpus16])
721
+ def click_train(
722
+ exp_dir1,
723
+ sr2,
724
+ if_f0_3,
725
+ spk_id5,
726
+ save_epoch10,
727
+ total_epoch11,
728
+ batch_size12,
729
+ if_save_latest13,
730
+ pretrained_G14,
731
+ pretrained_D15,
732
+ gpus16,
733
+ if_cache_gpu17,
734
+ if_save_every_weights18,
735
+ version19,
736
+ ):
737
+ CSVutil('csvdb/stop.csv', 'w+', 'formanting', False)
738
+ # 生成filelist
739
+ exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
740
+ os.makedirs(exp_dir, exist_ok=True)
741
+ gt_wavs_dir = "%s/0_gt_wavs" % (exp_dir)
742
+ feature_dir = (
743
+ "%s/3_feature256" % (exp_dir)
744
+ if version19 == "v1"
745
+ else "%s/3_feature768" % (exp_dir)
746
+ )
747
+
748
+ log_interval = set_log_interval(exp_dir, batch_size12)
749
+
750
+ if if_f0_3:
751
+ f0_dir = "%s/2a_f0" % (exp_dir)
752
+ f0nsf_dir = "%s/2b-f0nsf" % (exp_dir)
753
+ names = (
754
+ set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
755
+ & set([name.split(".")[0] for name in os.listdir(feature_dir)])
756
+ & set([name.split(".")[0] for name in os.listdir(f0_dir)])
757
+ & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
758
+ )
759
+ else:
760
+ names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
761
+ [name.split(".")[0] for name in os.listdir(feature_dir)]
762
+ )
763
+ opt = []
764
+ for name in names:
765
+ if if_f0_3:
766
+ opt.append(
767
+ "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
768
+ % (
769
+ gt_wavs_dir.replace("\\", "\\\\"),
770
+ name,
771
+ feature_dir.replace("\\", "\\\\"),
772
+ name,
773
+ f0_dir.replace("\\", "\\\\"),
774
+ name,
775
+ f0nsf_dir.replace("\\", "\\\\"),
776
+ name,
777
+ spk_id5,
778
+ )
779
+ )
780
+ else:
781
+ opt.append(
782
+ "%s/%s.wav|%s/%s.npy|%s"
783
+ % (
784
+ gt_wavs_dir.replace("\\", "\\\\"),
785
+ name,
786
+ feature_dir.replace("\\", "\\\\"),
787
+ name,
788
+ spk_id5,
789
+ )
790
+ )
791
+ fea_dim = 256 if version19 == "v1" else 768
792
+ if if_f0_3:
793
+ for _ in range(2):
794
+ opt.append(
795
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
796
+ % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
797
+ )
798
+ else:
799
+ for _ in range(2):
800
+ opt.append(
801
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
802
+ % (now_dir, sr2, now_dir, fea_dim, spk_id5)
803
+ )
804
+ shuffle(opt)
805
+ with open("%s/filelist.txt" % exp_dir, "w") as f:
806
+ f.write("\n".join(opt))
807
+ print("write filelist done")
808
+ # 生成config#无需生成config
809
+ # cmd = python_cmd + " train_nsf_sim_cache_sid_load_pretrain.py -e mi-test -sr 40k -f0 1 -bs 4 -g 0 -te 10 -se 5 -pg pretrained/f0G40k.pth -pd pretrained/f0D40k.pth -l 1 -c 0"
810
+ print("use gpus:", gpus16)
811
+ if pretrained_G14 == "":
812
+ print("no pretrained Generator")
813
+ if pretrained_D15 == "":
814
+ print("no pretrained Discriminator")
815
+ if gpus16:
816
+ cmd = (
817
+ config.python_cmd
818
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s -li %s"
819
+ % (
820
+ exp_dir1,
821
+ sr2,
822
+ 1 if if_f0_3 else 0,
823
+ batch_size12,
824
+ gpus16,
825
+ total_epoch11,
826
+ save_epoch10,
827
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "",
828
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "",
829
+ 1 if if_save_latest13 == True else 0,
830
+ 1 if if_cache_gpu17 == True else 0,
831
+ 1 if if_save_every_weights18 == True else 0,
832
+ version19,
833
+ log_interval,
834
+ )
835
+ )
836
+ else:
837
+ cmd = (
838
+ config.python_cmd
839
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s -li %s"
840
+ % (
841
+ exp_dir1,
842
+ sr2,
843
+ 1 if if_f0_3 else 0,
844
+ batch_size12,
845
+ total_epoch11,
846
+ save_epoch10,
847
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "\b",
848
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "\b",
849
+ 1 if if_save_latest13 == True else 0,
850
+ 1 if if_cache_gpu17 == True else 0,
851
+ 1 if if_save_every_weights18 == True else 0,
852
+ version19,
853
+ log_interval,
854
+ )
855
+ )
856
+ print(cmd)
857
+ p = Popen(cmd, shell=True, cwd=now_dir)
858
+ global PID
859
+ PID = p.pid
860
+ p.wait()
861
+ return ("训练结束, 您可查看控制台训练日志或实验文件夹下的train.log", {"visible": False, "__type__": "update"}, {"visible": True, "__type__": "update"})
862
+
863
+
864
+ # but4.click(train_index, [exp_dir1], info3)
865
+ def train_index(exp_dir1, version19):
866
+ exp_dir = "%s/logs/%s" % (now_dir, exp_dir1)
867
+ os.makedirs(exp_dir, exist_ok=True)
868
+ feature_dir = (
869
+ "%s/3_feature256" % (exp_dir)
870
+ if version19 == "v1"
871
+ else "%s/3_feature768" % (exp_dir)
872
+ )
873
+ if os.path.exists(feature_dir) == False:
874
+ return "请先进行特征提取!"
875
+ listdir_res = list(os.listdir(feature_dir))
876
+ if len(listdir_res) == 0:
877
+ return "请先进行特征提取!"
878
+ npys = []
879
+ for name in sorted(listdir_res):
880
+ phone = np.load("%s/%s" % (feature_dir, name))
881
+ npys.append(phone)
882
+ big_npy = np.concatenate(npys, 0)
883
+ big_npy_idx = np.arange(big_npy.shape[0])
884
+ np.random.shuffle(big_npy_idx)
885
+ big_npy = big_npy[big_npy_idx]
886
+ np.save("%s/total_fea.npy" % exp_dir, big_npy)
887
+ # n_ivf = big_npy.shape[0] // 39
888
+ n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
889
+ infos = []
890
+ infos.append("%s,%s" % (big_npy.shape, n_ivf))
891
+ yield "\n".join(infos)
892
+ index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
893
+ # index = faiss.index_factory(256if version19=="v1"else 768, "IVF%s,PQ128x4fs,RFlat"%n_ivf)
894
+ infos.append("training")
895
+ yield "\n".join(infos)
896
+ index_ivf = faiss.extract_index_ivf(index) #
897
+ index_ivf.nprobe = 1
898
+ index.train(big_npy)
899
+ faiss.write_index(
900
+ index,
901
+ "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
902
+ % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
903
+ )
904
+ # faiss.write_index(index, '%s/trained_IVF%s_Flat_FastScan_%s.index'%(exp_dir,n_ivf,version19))
905
+ infos.append("adding")
906
+ yield "\n".join(infos)
907
+ batch_size_add = 8192
908
+ for i in range(0, big_npy.shape[0], batch_size_add):
909
+ index.add(big_npy[i : i + batch_size_add])
910
+ faiss.write_index(
911
+ index,
912
+ "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
913
+ % (exp_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
914
+ )
915
+ infos.append(
916
+ "成功构建索引,added_IVF%s_Flat_nprobe_%s_%s_%s.index"
917
+ % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
918
+ )
919
+ # faiss.write_index(index, '%s/added_IVF%s_Flat_FastScan_%s.index'%(exp_dir,n_ivf,version19))
920
+ # infos.append("成功构建索引,added_IVF%s_Flat_FastScan_%s.index"%(n_ivf,version19))
921
+ yield "\n".join(infos)
922
+
923
+
924
+ # but5.click(train1key, [exp_dir1, sr2, if_f0_3, trainset_dir4, spk_id5, gpus6, np7, f0method8, save_epoch10, total_epoch11, batch_size12, if_save_latest13, pretrained_G14, pretrained_D15, gpus16, if_cache_gpu17], info3)
925
+ def train1key(
926
+ exp_dir1,
927
+ sr2,
928
+ if_f0_3,
929
+ trainset_dir4,
930
+ spk_id5,
931
+ np7,
932
+ f0method8,
933
+ save_epoch10,
934
+ total_epoch11,
935
+ batch_size12,
936
+ if_save_latest13,
937
+ pretrained_G14,
938
+ pretrained_D15,
939
+ gpus16,
940
+ if_cache_gpu17,
941
+ if_save_every_weights18,
942
+ version19,
943
+ echl
944
+ ):
945
+ infos = []
946
+
947
+ def get_info_str(strr):
948
+ infos.append(strr)
949
+ return "\n".join(infos)
950
+
951
+ model_log_dir = "%s/logs/%s" % (now_dir, exp_dir1)
952
+ preprocess_log_path = "%s/preprocess.log" % model_log_dir
953
+ extract_f0_feature_log_path = "%s/extract_f0_feature.log" % model_log_dir
954
+ gt_wavs_dir = "%s/0_gt_wavs" % model_log_dir
955
+ feature_dir = (
956
+ "%s/3_feature256" % model_log_dir
957
+ if version19 == "v1"
958
+ else "%s/3_feature768" % model_log_dir
959
+ )
960
+
961
+ os.makedirs(model_log_dir, exist_ok=True)
962
+ #########step1:处理数据
963
+ open(preprocess_log_path, "w").close()
964
+ cmd = (
965
+ config.python_cmd
966
+ + " trainset_preprocess_pipeline_print.py %s %s %s %s "
967
+ % (trainset_dir4, sr_dict[sr2], np7, model_log_dir)
968
+ + str(config.noparallel)
969
+ )
970
+ yield get_info_str(i18n("step1:正在处理数据"))
971
+ yield get_info_str(cmd)
972
+ p = Popen(cmd, shell=True)
973
+ p.wait()
974
+ with open(preprocess_log_path, "r") as f:
975
+ print(f.read())
976
+ #########step2a:提取音高
977
+ open(extract_f0_feature_log_path, "w")
978
+ if if_f0_3:
979
+ yield get_info_str("step2a:正在提取音高")
980
+ cmd = config.python_cmd + " extract_f0_print.py %s %s %s %s" % (
981
+ model_log_dir,
982
+ np7,
983
+ f0method8,
984
+ echl
985
+ )
986
+ yield get_info_str(cmd)
987
+ p = Popen(cmd, shell=True, cwd=now_dir)
988
+ p.wait()
989
+ with open(extract_f0_feature_log_path, "r") as f:
990
+ print(f.read())
991
+ else:
992
+ yield get_info_str(i18n("step2a:无需提取音高"))
993
+ #######step2b:提取特征
994
+ yield get_info_str(i18n("step2b:正在提取特征"))
995
+ gpus = gpus16.split("-")
996
+ leng = len(gpus)
997
+ ps = []
998
+ for idx, n_g in enumerate(gpus):
999
+ cmd = config.python_cmd + " extract_feature_print.py %s %s %s %s %s %s" % (
1000
+ config.device,
1001
+ leng,
1002
+ idx,
1003
+ n_g,
1004
+ model_log_dir,
1005
+ version19,
1006
+ )
1007
+ yield get_info_str(cmd)
1008
+ p = Popen(
1009
+ cmd, shell=True, cwd=now_dir
1010
+ ) # , shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE, cwd=now_dir
1011
+ ps.append(p)
1012
+ for p in ps:
1013
+ p.wait()
1014
+ with open(extract_f0_feature_log_path, "r") as f:
1015
+ print(f.read())
1016
+ #######step3a:训练模型
1017
+ yield get_info_str(i18n("step3a:正在训练模型"))
1018
+ # 生成filelist
1019
+ if if_f0_3:
1020
+ f0_dir = "%s/2a_f0" % model_log_dir
1021
+ f0nsf_dir = "%s/2b-f0nsf" % model_log_dir
1022
+ names = (
1023
+ set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)])
1024
+ & set([name.split(".")[0] for name in os.listdir(feature_dir)])
1025
+ & set([name.split(".")[0] for name in os.listdir(f0_dir)])
1026
+ & set([name.split(".")[0] for name in os.listdir(f0nsf_dir)])
1027
+ )
1028
+ else:
1029
+ names = set([name.split(".")[0] for name in os.listdir(gt_wavs_dir)]) & set(
1030
+ [name.split(".")[0] for name in os.listdir(feature_dir)]
1031
+ )
1032
+ opt = []
1033
+ for name in names:
1034
+ if if_f0_3:
1035
+ opt.append(
1036
+ "%s/%s.wav|%s/%s.npy|%s/%s.wav.npy|%s/%s.wav.npy|%s"
1037
+ % (
1038
+ gt_wavs_dir.replace("\\", "\\\\"),
1039
+ name,
1040
+ feature_dir.replace("\\", "\\\\"),
1041
+ name,
1042
+ f0_dir.replace("\\", "\\\\"),
1043
+ name,
1044
+ f0nsf_dir.replace("\\", "\\\\"),
1045
+ name,
1046
+ spk_id5,
1047
+ )
1048
+ )
1049
+ else:
1050
+ opt.append(
1051
+ "%s/%s.wav|%s/%s.npy|%s"
1052
+ % (
1053
+ gt_wavs_dir.replace("\\", "\\\\"),
1054
+ name,
1055
+ feature_dir.replace("\\", "\\\\"),
1056
+ name,
1057
+ spk_id5,
1058
+ )
1059
+ )
1060
+ fea_dim = 256 if version19 == "v1" else 768
1061
+ if if_f0_3:
1062
+ for _ in range(2):
1063
+ opt.append(
1064
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s/logs/mute/2a_f0/mute.wav.npy|%s/logs/mute/2b-f0nsf/mute.wav.npy|%s"
1065
+ % (now_dir, sr2, now_dir, fea_dim, now_dir, now_dir, spk_id5)
1066
+ )
1067
+ else:
1068
+ for _ in range(2):
1069
+ opt.append(
1070
+ "%s/logs/mute/0_gt_wavs/mute%s.wav|%s/logs/mute/3_feature%s/mute.npy|%s"
1071
+ % (now_dir, sr2, now_dir, fea_dim, spk_id5)
1072
+ )
1073
+ shuffle(opt)
1074
+ with open("%s/filelist.txt" % model_log_dir, "w") as f:
1075
+ f.write("\n".join(opt))
1076
+ yield get_info_str("write filelist done")
1077
+ if gpus16:
1078
+ cmd = (
1079
+ config.python_cmd
1080
+ +" train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -g %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
1081
+ % (
1082
+ exp_dir1,
1083
+ sr2,
1084
+ 1 if if_f0_3 else 0,
1085
+ batch_size12,
1086
+ gpus16,
1087
+ total_epoch11,
1088
+ save_epoch10,
1089
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "",
1090
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "",
1091
+ 1 if if_save_latest13 == True else 0,
1092
+ 1 if if_cache_gpu17 == True else 0,
1093
+ 1 if if_save_every_weights18 == True else 0,
1094
+ version19,
1095
+ )
1096
+ )
1097
+ else:
1098
+ cmd = (
1099
+ config.python_cmd
1100
+ + " train_nsf_sim_cache_sid_load_pretrain.py -e %s -sr %s -f0 %s -bs %s -te %s -se %s %s %s -l %s -c %s -sw %s -v %s"
1101
+ % (
1102
+ exp_dir1,
1103
+ sr2,
1104
+ 1 if if_f0_3 else 0,
1105
+ batch_size12,
1106
+ total_epoch11,
1107
+ save_epoch10,
1108
+ ("-pg %s" % pretrained_G14) if pretrained_G14 != "" else "",
1109
+ ("-pd %s" % pretrained_D15) if pretrained_D15 != "" else "",
1110
+ 1 if if_save_latest13 == True else 0,
1111
+ 1 if if_cache_gpu17 == True else 0,
1112
+ 1 if if_save_every_weights18 == True else 0,
1113
+ version19,
1114
+ )
1115
+ )
1116
+ yield get_info_str(cmd)
1117
+ p = Popen(cmd, shell=True, cwd=now_dir)
1118
+ p.wait()
1119
+ yield get_info_str(i18n("训练结束, 您可查看控制台训练日志或实验文件夹下的train.log"))
1120
+ #######step3b:训练索引
1121
+ npys = []
1122
+ listdir_res = list(os.listdir(feature_dir))
1123
+ for name in sorted(listdir_res):
1124
+ phone = np.load("%s/%s" % (feature_dir, name))
1125
+ npys.append(phone)
1126
+ big_npy = np.concatenate(npys, 0)
1127
+
1128
+ big_npy_idx = np.arange(big_npy.shape[0])
1129
+ np.random.shuffle(big_npy_idx)
1130
+ big_npy = big_npy[big_npy_idx]
1131
+ np.save("%s/total_fea.npy" % model_log_dir, big_npy)
1132
+
1133
+ # n_ivf = big_npy.shape[0] // 39
1134
+ n_ivf = min(int(16 * np.sqrt(big_npy.shape[0])), big_npy.shape[0] // 39)
1135
+ yield get_info_str("%s,%s" % (big_npy.shape, n_ivf))
1136
+ index = faiss.index_factory(256 if version19 == "v1" else 768, "IVF%s,Flat" % n_ivf)
1137
+ yield get_info_str("training index")
1138
+ index_ivf = faiss.extract_index_ivf(index) #
1139
+ index_ivf.nprobe = 1
1140
+ index.train(big_npy)
1141
+ faiss.write_index(
1142
+ index,
1143
+ "%s/trained_IVF%s_Flat_nprobe_%s_%s_%s.index"
1144
+ % (model_log_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
1145
+ )
1146
+ yield get_info_str("adding index")
1147
+ batch_size_add = 8192
1148
+ for i in range(0, big_npy.shape[0], batch_size_add):
1149
+ index.add(big_npy[i : i + batch_size_add])
1150
+ faiss.write_index(
1151
+ index,
1152
+ "%s/added_IVF%s_Flat_nprobe_%s_%s_%s.index"
1153
+ % (model_log_dir, n_ivf, index_ivf.nprobe, exp_dir1, version19),
1154
+ )
1155
+ yield get_info_str(
1156
+ "成功构建索引, added_IVF%s_Flat_nprobe_%s_%s_%s.index"
1157
+ % (n_ivf, index_ivf.nprobe, exp_dir1, version19)
1158
+ )
1159
+ yield get_info_str(i18n("全流程结束!"))
1160
+
1161
+
1162
+ def whethercrepeornah(radio):
1163
+ mango = True if radio == 'mangio-crepe' or radio == 'mangio-crepe-tiny' else False
1164
+ return ({"visible": mango, "__type__": "update"})
1165
+
1166
+ # ckpt_path2.change(change_info_,[ckpt_path2],[sr__,if_f0__])
1167
+ def change_info_(ckpt_path):
1168
+ if (
1169
+ os.path.exists(ckpt_path.replace(os.path.basename(ckpt_path), "train.log"))
1170
+ == False
1171
+ ):
1172
+ return {"__type__": "update"}, {"__type__": "update"}, {"__type__": "update"}
1173
+ try:
1174
+ with open(
1175
+ ckpt_path.replace(os.path.basename(ckpt_path), "train.log"), "r"
1176
+ ) as f:
1177
+ info = eval(f.read().strip("\n").split("\n")[0].split("\t")[-1])
1178
+ sr, f0 = info["sample_rate"], info["if_f0"]
1179
+ version = "v2" if ("version" in info and info["version"] == "v2") else "v1"
1180
+ return sr, str(f0), version
1181
+ except:
1182
+ traceback.print_exc()
1183
+ return {"__type__": "update"}, {"__type__": "update"}, {"__type__": "update"}
1184
+
1185
+
1186
+ from lib.infer_pack.models_onnx import SynthesizerTrnMsNSFsidM
1187
+
1188
+
1189
+ def export_onnx(ModelPath, ExportedPath, MoeVS=True):
1190
+ cpt = torch.load(ModelPath, map_location="cpu")
1191
+ cpt["config"][-3] = cpt["weight"]["emb_g.weight"].shape[0] # n_spk
1192
+ hidden_channels = 256 if cpt.get("version","v1")=="v1"else 768#cpt["config"][-2] # hidden_channels,为768Vec做准备
1193
+
1194
+ test_phone = torch.rand(1, 200, hidden_channels) # hidden unit
1195
+ test_phone_lengths = torch.tensor([200]).long() # hidden unit 长度(貌似没啥用)
1196
+ test_pitch = torch.randint(size=(1, 200), low=5, high=255) # 基频(单位赫兹)
1197
+ test_pitchf = torch.rand(1, 200) # nsf基频
1198
+ test_ds = torch.LongTensor([0]) # 说话人ID
1199
+ test_rnd = torch.rand(1, 192, 200) # 噪声(加入随机因子)
1200
+
1201
+ device = "cpu" # 导出时设备(不影响使用模型)
1202
+
1203
+
1204
+ net_g = SynthesizerTrnMsNSFsidM(
1205
+ *cpt["config"], is_half=False,version=cpt.get("version","v1")
1206
+ ) # fp32导出(C++要支持fp16必须手动将内存重新排列所以暂时不用fp16)
1207
+ net_g.load_state_dict(cpt["weight"], strict=False)
1208
+ input_names = ["phone", "phone_lengths", "pitch", "pitchf", "ds", "rnd"]
1209
+ output_names = [
1210
+ "audio",
1211
+ ]
1212
+ # net_g.construct_spkmixmap(n_speaker) 多角色混合轨道导出
1213
+ torch.onnx.export(
1214
+ net_g,
1215
+ (
1216
+ test_phone.to(device),
1217
+ test_phone_lengths.to(device),
1218
+ test_pitch.to(device),
1219
+ test_pitchf.to(device),
1220
+ test_ds.to(device),
1221
+ test_rnd.to(device),
1222
+ ),
1223
+ ExportedPath,
1224
+ dynamic_axes={
1225
+ "phone": [1],
1226
+ "pitch": [1],
1227
+ "pitchf": [1],
1228
+ "rnd": [2],
1229
+ },
1230
+ do_constant_folding=False,
1231
+ opset_version=16,
1232
+ verbose=False,
1233
+ input_names=input_names,
1234
+ output_names=output_names,
1235
+ )
1236
+ return "Finished"
1237
+
1238
+ #region RVC WebUI App
1239
+
1240
+ def get_presets():
1241
+ data = None
1242
+ with open('../inference-presets.json', 'r') as file:
1243
+ data = json.load(file)
1244
+ preset_names = []
1245
+ for preset in data['presets']:
1246
+ preset_names.append(preset['name'])
1247
+
1248
+ return preset_names
1249
+
1250
+ def change_choices2():
1251
+ audio_files=[]
1252
+ for filename in os.listdir("./audios"):
1253
+ if filename.endswith(('.wav','.mp3','.ogg','.flac','.m4a','.aac','.mp4')):
1254
+ audio_files.append(os.path.join('./audios',filename).replace('\\', '/'))
1255
+ return {"choices": sorted(audio_files), "__type__": "update"}, {"__type__": "update"}
1256
+
1257
+ audio_files=[]
1258
+ for filename in os.listdir("./audios"):
1259
+ if filename.endswith(('.wav','.mp3','.ogg','.flac','.m4a','.aac','.mp4')):
1260
+ audio_files.append(os.path.join('./audios',filename).replace('\\', '/'))
1261
+
1262
+ def get_index():
1263
+ if check_for_name() != '':
1264
+ chosen_model=sorted(names)[0].split(".")[0]
1265
+ logs_path="./logs/"+chosen_model
1266
+ if os.path.exists(logs_path):
1267
+ for file in os.listdir(logs_path):
1268
+ if file.endswith(".index"):
1269
+ return os.path.join(logs_path, file)
1270
+ return ''
1271
+ else:
1272
+ return ''
1273
+
1274
+ def get_indexes():
1275
+ indexes_list=[]
1276
+ for dirpath, dirnames, filenames in os.walk("./logs/"):
1277
+ for filename in filenames:
1278
+ if filename.endswith(".index"):
1279
+ indexes_list.append(os.path.join(dirpath,filename))
1280
+ if len(indexes_list) > 0:
1281
+ return indexes_list
1282
+ else:
1283
+ return ''
1284
+
1285
+ def get_name():
1286
+ if len(audio_files) > 0:
1287
+ return sorted(audio_files)[0]
1288
+ else:
1289
+ return ''
1290
+
1291
+ def save_to_wav(record_button):
1292
+ if record_button is None:
1293
+ pass
1294
+ else:
1295
+ path_to_file=record_button
1296
+ new_name = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")+'.wav'
1297
+ new_path='./audios/'+new_name
1298
+ shutil.move(path_to_file,new_path)
1299
+ return new_path
1300
+
1301
+ def save_to_wav2(dropbox):
1302
+ file_path=dropbox.name
1303
+ shutil.move(file_path,'./audios')
1304
+ return os.path.join('./audios',os.path.basename(file_path))
1305
+
1306
+ def match_index(sid0):
1307
+ folder=sid0.split(".")[0]
1308
+ parent_dir="./logs/"+folder
1309
+ if os.path.exists(parent_dir):
1310
+ for filename in os.listdir(parent_dir):
1311
+ if filename.endswith(".index"):
1312
+ index_path=os.path.join(parent_dir,filename)
1313
+ return index_path
1314
+ else:
1315
+ return ''
1316
+
1317
+ def check_for_name():
1318
+ if len(names) > 0:
1319
+ return sorted(names)[0]
1320
+ else:
1321
+ return ''
1322
+
1323
+ def download_from_url(url, model):
1324
+ if url == '':
1325
+ return "URL cannot be left empty."
1326
+ if model =='':
1327
+ return "You need to name your model. For example: My-Model"
1328
+ url = url.strip()
1329
+ zip_dirs = ["zips", "unzips"]
1330
+ for directory in zip_dirs:
1331
+ if os.path.exists(directory):
1332
+ shutil.rmtree(directory)
1333
+ os.makedirs("zips", exist_ok=True)
1334
+ os.makedirs("unzips", exist_ok=True)
1335
+ zipfile = model + '.zip'
1336
+ zipfile_path = './zips/' + zipfile
1337
+ try:
1338
+ if "drive.google.com" in url:
1339
+ subprocess.run(["gdown", url, "--fuzzy", "-O", zipfile_path])
1340
+ elif "mega.nz" in url:
1341
+ m = Mega()
1342
+ m.download_url(url, './zips')
1343
+ else:
1344
+ subprocess.run(["wget", url, "-O", zipfile_path])
1345
+ for filename in os.listdir("./zips"):
1346
+ if filename.endswith(".zip"):
1347
+ zipfile_path = os.path.join("./zips/",filename)
1348
+ shutil.unpack_archive(zipfile_path, "./unzips", 'zip')
1349
+ else:
1350
+ return "No zipfile found."
1351
+ for root, dirs, files in os.walk('./unzips'):
1352
+ for file in files:
1353
+ file_path = os.path.join(root, file)
1354
+ if file.endswith(".index"):
1355
+ os.mkdir(f'./logs/{model}')
1356
+ shutil.copy2(file_path,f'./logs/{model}')
1357
+ elif "G_" not in file and "D_" not in file and file.endswith(".pth"):
1358
+ shutil.copy(file_path,f'./weights/{model}.pth')
1359
+ shutil.rmtree("zips")
1360
+ shutil.rmtree("unzips")
1361
+ return "Success."
1362
+ except:
1363
+ return "There's been an error."
1364
+ def success_message(face):
1365
+ return f'{face.name} has been uploaded.', 'None'
1366
+ def mouth(size, face, voice, faces):
1367
+ if size == 'Half':
1368
+ size = 2
1369
+ else:
1370
+ size = 1
1371
+ if faces == 'None':
1372
+ character = face.name
1373
+ else:
1374
+ if faces == 'Ben Shapiro':
1375
+ character = '/content/wav2lip-HD/inputs/ben-shapiro-10.mp4'
1376
+ elif faces == 'Andrew Tate':
1377
+ character = '/content/wav2lip-HD/inputs/tate-7.mp4'
1378
+ command = "python inference.py " \
1379
+ "--checkpoint_path checkpoints/wav2lip.pth " \
1380
+ f"--face {character} " \
1381
+ f"--audio {voice} " \
1382
+ "--pads 0 20 0 0 " \
1383
+ "--outfile /content/wav2lip-HD/outputs/result.mp4 " \
1384
+ "--fps 24 " \
1385
+ f"--resize_factor {size}"
1386
+ process = subprocess.Popen(command, shell=True, cwd='/content/wav2lip-HD/Wav2Lip-master')
1387
+ stdout, stderr = process.communicate()
1388
+ return '/content/wav2lip-HD/outputs/result.mp4', 'Animation completed.'
1389
+ eleven_voices = ['Adam','Antoni','Josh','Arnold','Sam','Bella','Rachel','Domi','Elli']
1390
+ eleven_voices_ids=['pNInz6obpgDQGcFmaJgB','ErXwobaYiN019PkySvjV','TxGEqnHWrfWFTfGW9XjX','VR6AewLTigWG4xSOukaG','yoZ06aMxZJJ28mfd3POQ','EXAVITQu4vr4xnSDxMaL','21m00Tcm4TlvDq8ikWAM','AZnzlk1XvdvUeBnXmlld','MF3mGyEYCl7XYWbV9V6O']
1391
+ chosen_voice = dict(zip(eleven_voices, eleven_voices_ids))
1392
+
1393
+ def stoptraining(mim):
1394
+ if int(mim) == 1:
1395
+ try:
1396
+ CSVutil('csvdb/stop.csv', 'w+', 'stop', 'True')
1397
+ os.kill(PID, signal.SIGTERM)
1398
+ except Exception as e:
1399
+ print(f"Couldn't click due to {e}")
1400
+ return (
1401
+ {"visible": False, "__type__": "update"},
1402
+ {"visible": True, "__type__": "update"},
1403
+ )
1404
+
1405
+
1406
+ def elevenTTS(xiapi, text, id, lang):
1407
+ if xiapi!= '' and id !='':
1408
+ choice = chosen_voice[id]
1409
+ CHUNK_SIZE = 1024
1410
+ url = f"https://api.elevenlabs.io/v1/text-to-speech/{choice}"
1411
+ headers = {
1412
+ "Accept": "audio/mpeg",
1413
+ "Content-Type": "application/json",
1414
+ "xi-api-key": xiapi
1415
+ }
1416
+ if lang == 'en':
1417
+ data = {
1418
+ "text": text,
1419
+ "model_id": "eleven_monolingual_v1",
1420
+ "voice_settings": {
1421
+ "stability": 0.5,
1422
+ "similarity_boost": 0.5
1423
+ }
1424
+ }
1425
+ else:
1426
+ data = {
1427
+ "text": text,
1428
+ "model_id": "eleven_multilingual_v1",
1429
+ "voice_settings": {
1430
+ "stability": 0.5,
1431
+ "similarity_boost": 0.5
1432
+ }
1433
+ }
1434
+
1435
+ response = requests.post(url, json=data, headers=headers)
1436
+ with open('./temp_eleven.mp3', 'wb') as f:
1437
+ for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
1438
+ if chunk:
1439
+ f.write(chunk)
1440
+ aud_path = save_to_wav('./temp_eleven.mp3')
1441
+ return aud_path, aud_path
1442
+ else:
1443
+ tts = gTTS(text, lang=lang)
1444
+ tts.save('./temp_gTTS.mp3')
1445
+ aud_path = save_to_wav('./temp_gTTS.mp3')
1446
+ return aud_path, aud_path
1447
+
1448
+ def upload_to_dataset(files, dir):
1449
+ if dir == '':
1450
+ dir = './dataset'
1451
+ if not os.path.exists(dir):
1452
+ os.makedirs(dir)
1453
+ count = 0
1454
+ for file in files:
1455
+ path=file.name
1456
+ shutil.copy2(path,dir)
1457
+ count += 1
1458
+ return f' {count} files uploaded to {dir}.'
1459
+
1460
+ def zip_downloader(model):
1461
+ if not os.path.exists(f'./weights/{model}.pth'):
1462
+ return {"__type__": "update"}, f'Make sure the Voice Name is correct. I could not find {model}.pth'
1463
+ index_found = False
1464
+ for file in os.listdir(f'./logs/{model}'):
1465
+ if file.endswith('.index') and 'added' in file:
1466
+ log_file = file
1467
+ index_found = True
1468
+ if index_found:
1469
+ return [f'./weights/{model}.pth', f'./logs/{model}/{log_file}'], "Done"
1470
+ else:
1471
+ return f'./weights/{model}.pth', "Could not find Index file."
1472
+
1473
+ with gr.Blocks(theme=gr.themes.Base(), title='Mangio-RVC-Web 💻') as app:
1474
+ with gr.Tabs():
1475
+ with gr.TabItem("Inference"):
1476
+ gr.HTML("<h1> RVC V2 Huggingface Version </h1>")
1477
+
1478
+ # Inference Preset Row
1479
+ # with gr.Row():
1480
+ # mangio_preset = gr.Dropdown(label="Inference Preset", choices=sorted(get_presets()))
1481
+ # mangio_preset_name_save = gr.Textbox(
1482
+ # label="Your preset name"
1483
+ # )
1484
+ # mangio_preset_save_btn = gr.Button('Save Preset', variant="primary")
1485
+
1486
+ # Other RVC stuff
1487
+ with gr.Row():
1488
+ sid0 = gr.Dropdown(label="1.Choose your Model.", choices=sorted(names), value=check_for_name())
1489
+ refresh_button = gr.Button("Refresh", variant="primary")
1490
+ if check_for_name() != '':
1491
+ get_vc(sorted(names)[0])
1492
+ vc_transform0 = gr.Number(label="Optional: You can change the pitch here or leave it at 0.", value=0)
1493
+ #clean_button = gr.Button(i18n("卸载音色省显存"), variant="primary")
1494
+ spk_item = gr.Slider(
1495
+ minimum=0,
1496
+ maximum=2333,
1497
+ step=1,
1498
+ label=i18n("请选择说话人id"),
1499
+ value=0,
1500
+ visible=False,
1501
+ interactive=True,
1502
+ )
1503
+ #clean_button.click(fn=clean, inputs=[], outputs=[sid0])
1504
+ sid0.change(
1505
+ fn=get_vc,
1506
+ inputs=[sid0],
1507
+ outputs=[spk_item],
1508
+ )
1509
+ but0 = gr.Button("Convert", variant="primary")
1510
+ with gr.Row():
1511
+ with gr.Column():
1512
+ with gr.Row():
1513
+ dropbox = gr.File(label="Drop your audio here & hit the Reload button.")
1514
+ with gr.Row():
1515
+ record_button=gr.Audio(source="microphone", label="OR Record audio.", type="filepath")
1516
+ with gr.Row():
1517
+ input_audio0 = gr.Dropdown(
1518
+ label="2.Choose your audio.",
1519
+ value="./audios/someguy.mp3",
1520
+ choices=audio_files
1521
+ )
1522
+ dropbox.upload(fn=save_to_wav2, inputs=[dropbox], outputs=[input_audio0])
1523
+ dropbox.upload(fn=change_choices2, inputs=[], outputs=[input_audio0])
1524
+ refresh_button2 = gr.Button("Refresh", variant="primary", size='sm')
1525
+ record_button.change(fn=save_to_wav, inputs=[record_button], outputs=[input_audio0])
1526
+ record_button.change(fn=change_choices2, inputs=[], outputs=[input_audio0])
1527
+ with gr.Row():
1528
+ with gr.Accordion('Text To Speech', open=False):
1529
+ with gr.Column():
1530
+ lang = gr.Radio(label='Chinese & Japanese do not work with ElevenLabs currently.',choices=['en','es','fr','pt','zh-CN','de','hi','ja'], value='en')
1531
+ api_box = gr.Textbox(label="Enter your API Key for ElevenLabs, or leave empty to use GoogleTTS", value='')
1532
+ elevenid=gr.Dropdown(label="Voice:", choices=eleven_voices)
1533
+ with gr.Column():
1534
+ tfs = gr.Textbox(label="Input your Text", interactive=True, value="This is a test.")
1535
+ tts_button = gr.Button(value="Speak")
1536
+ tts_button.click(fn=elevenTTS, inputs=[api_box,tfs, elevenid, lang], outputs=[record_button, input_audio0])
1537
+ with gr.Row():
1538
+ with gr.Accordion('Wav2Lip', open=False):
1539
+ with gr.Row():
1540
+ size = gr.Radio(label='Resolution:',choices=['Half','Full'])
1541
+ face = gr.UploadButton("Upload A Character",type='file')
1542
+ faces = gr.Dropdown(label="OR Choose one:", choices=['None','Ben Shapiro','Andrew Tate'])
1543
+ with gr.Row():
1544
+ preview = gr.Textbox(label="Status:",interactive=False)
1545
+ face.upload(fn=success_message,inputs=[face], outputs=[preview, faces])
1546
+ with gr.Row():
1547
+ animation = gr.Video(type='filepath')
1548
+ refresh_button2.click(fn=change_choices2, inputs=[], outputs=[input_audio0, animation])
1549
+ with gr.Row():
1550
+ animate_button = gr.Button('Animate')
1551
+
1552
+ with gr.Column():
1553
+ with gr.Accordion("Index Settings", open=False):
1554
+ file_index1 = gr.Dropdown(
1555
+ label="3. Path to your added.index file (if it didn't automatically find it.)",
1556
+ choices=get_indexes(),
1557
+ value=get_index(),
1558
+ interactive=True,
1559
+ )
1560
+ sid0.change(fn=match_index, inputs=[sid0],outputs=[file_index1])
1561
+ refresh_button.click(
1562
+ fn=change_choices, inputs=[], outputs=[sid0, file_index1]
1563
+ )
1564
+ # file_big_npy1 = gr.Textbox(
1565
+ # label=i18n("特征文件路径"),
1566
+ # value="E:\\codes\py39\\vits_vc_gpu_train\\logs\\mi-test-1key\\total_fea.npy",
1567
+ # interactive=True,
1568
+ # )
1569
+ index_rate1 = gr.Slider(
1570
+ minimum=0,
1571
+ maximum=1,
1572
+ label=i18n("检索特征占比"),
1573
+ value=0.66,
1574
+ interactive=True,
1575
+ )
1576
+ vc_output2 = gr.Audio(
1577
+ label="Output Audio (Click on the Three Dots in the Right Corner to Download)",
1578
+ type='filepath',
1579
+ interactive=False,
1580
+ )
1581
+ animate_button.click(fn=mouth, inputs=[size, face, vc_output2, faces], outputs=[animation, preview])
1582
+ with gr.Accordion("Advanced Settings", open=False):
1583
+ f0method0 = gr.Radio(
1584
+ label="Optional: Change the Pitch Extraction Algorithm.\nExtraction methods are sorted from 'worst quality' to 'best quality'.\nmangio-crepe may or may not be better than rmvpe in cases where 'smoothness' is more important, but rmvpe is the best overall.",
1585
+ choices=["pm", "dio", "crepe-tiny", "mangio-crepe-tiny", "crepe", "harvest", "mangio-crepe", "rmvpe"], # Fork Feature. Add Crepe-Tiny
1586
+ value="rmvpe",
1587
+ interactive=True,
1588
+ )
1589
+
1590
+ crepe_hop_length = gr.Slider(
1591
+ minimum=1,
1592
+ maximum=512,
1593
+ step=1,
1594
+ label="Mangio-Crepe Hop Length. Higher numbers will reduce the chance of extreme pitch changes but lower numbers will increase accuracy. 64-192 is a good range to experiment with.",
1595
+ value=120,
1596
+ interactive=True,
1597
+ visible=False,
1598
+ )
1599
+ f0method0.change(fn=whethercrepeornah, inputs=[f0method0], outputs=[crepe_hop_length])
1600
+ filter_radius0 = gr.Slider(
1601
+ minimum=0,
1602
+ maximum=7,
1603
+ label=i18n(">=3则使用对harvest音高识别的结果使用中值滤波,数值为滤波半径,使用可以削弱哑音"),
1604
+ value=3,
1605
+ step=1,
1606
+ interactive=True,
1607
+ )
1608
+ resample_sr0 = gr.Slider(
1609
+ minimum=0,
1610
+ maximum=48000,
1611
+ label=i18n("后处理重采样至最终采样率,0为不进行重采样"),
1612
+ value=0,
1613
+ step=1,
1614
+ interactive=True,
1615
+ visible=False
1616
+ )
1617
+ rms_mix_rate0 = gr.Slider(
1618
+ minimum=0,
1619
+ maximum=1,
1620
+ label=i18n("输入源音量包络替换输出音量包络融合比例,越靠近1越使用输出包络"),
1621
+ value=0.21,
1622
+ interactive=True,
1623
+ )
1624
+ protect0 = gr.Slider(
1625
+ minimum=0,
1626
+ maximum=0.5,
1627
+ label=i18n("保护清辅音和呼吸声,防止电音撕裂等artifact,拉满0.5不开启,调低加大保护力度但可能降低索引效果"),
1628
+ value=0.33,
1629
+ step=0.01,
1630
+ interactive=True,
1631
+ )
1632
+ formanting = gr.Checkbox(
1633
+ value=bool(DoFormant),
1634
+ label="[EXPERIMENTAL] Formant shift inference audio",
1635
+ info="Used for male to female and vice-versa conversions",
1636
+ interactive=True,
1637
+ visible=True,
1638
+ )
1639
+
1640
+ formant_preset = gr.Dropdown(
1641
+ value='',
1642
+ choices=get_fshift_presets(),
1643
+ label="browse presets for formanting",
1644
+ visible=bool(DoFormant),
1645
+ )
1646
+ formant_refresh_button = gr.Button(
1647
+ value='\U0001f504',
1648
+ visible=bool(DoFormant),
1649
+ variant='primary',
1650
+ )
1651
+ #formant_refresh_button = ToolButton( elem_id='1')
1652
+ #create_refresh_button(formant_preset, lambda: {"choices": formant_preset}, "refresh_list_shiftpresets")
1653
+
1654
+ qfrency = gr.Slider(
1655
+ value=Quefrency,
1656
+ info="Default value is 1.0",
1657
+ label="Quefrency for formant shifting",
1658
+ minimum=0.0,
1659
+ maximum=16.0,
1660
+ step=0.1,
1661
+ visible=bool(DoFormant),
1662
+ interactive=True,
1663
+ )
1664
+ tmbre = gr.Slider(
1665
+ value=Timbre,
1666
+ info="Default value is 1.0",
1667
+ label="Timbre for formant shifting",
1668
+ minimum=0.0,
1669
+ maximum=16.0,
1670
+ step=0.1,
1671
+ visible=bool(DoFormant),
1672
+ interactive=True,
1673
+ )
1674
+
1675
+ formant_preset.change(fn=preset_apply, inputs=[formant_preset, qfrency, tmbre], outputs=[qfrency, tmbre])
1676
+ frmntbut = gr.Button("Apply", variant="primary", visible=bool(DoFormant))
1677
+ formanting.change(fn=formant_enabled,inputs=[formanting,qfrency,tmbre,frmntbut,formant_preset,formant_refresh_button],outputs=[formanting,qfrency,tmbre,frmntbut,formant_preset,formant_refresh_button])
1678
+ frmntbut.click(fn=formant_apply,inputs=[qfrency, tmbre], outputs=[qfrency, tmbre])
1679
+ formant_refresh_button.click(fn=update_fshift_presets,inputs=[formant_preset, qfrency, tmbre],outputs=[formant_preset, qfrency, tmbre])
1680
+ with gr.Row():
1681
+ vc_output1 = gr.Textbox("")
1682
+ f0_file = gr.File(label=i18n("F0曲线文件, 可选, 一行一个音高, 代替默认F0及升降调"), visible=False)
1683
+
1684
+ but0.click(
1685
+ vc_single,
1686
+ [
1687
+ spk_item,
1688
+ input_audio0,
1689
+ vc_transform0,
1690
+ f0_file,
1691
+ f0method0,
1692
+ file_index1,
1693
+ # file_index2,
1694
+ # file_big_npy1,
1695
+ index_rate1,
1696
+ filter_radius0,
1697
+ resample_sr0,
1698
+ rms_mix_rate0,
1699
+ protect0,
1700
+ crepe_hop_length
1701
+ ],
1702
+ [vc_output1, vc_output2],
1703
+ )
1704
+
1705
+ with gr.Accordion("Batch Conversion",open=False):
1706
+ with gr.Row():
1707
+ with gr.Column():
1708
+ vc_transform1 = gr.Number(
1709
+ label=i18n("变调(整数, 半音数量, 升八度12降八度-12)"), value=0
1710
+ )
1711
+ opt_input = gr.Textbox(label=i18n("指定输出文件夹"), value="opt")
1712
+ f0method1 = gr.Radio(
1713
+ label=i18n(
1714
+ "选择音高提取算法,输入歌声可用pm提速,harvest低音好但巨慢无比,crepe效果好但吃GPU"
1715
+ ),
1716
+ choices=["pm", "harvest", "crepe", "rmvpe"],
1717
+ value="rmvpe",
1718
+ interactive=True,
1719
+ )
1720
+ filter_radius1 = gr.Slider(
1721
+ minimum=0,
1722
+ maximum=7,
1723
+ label=i18n(">=3则使用对harvest音高识别的结果使用中值滤波,数值为滤波半径,使用可以削弱哑音"),
1724
+ value=3,
1725
+ step=1,
1726
+ interactive=True,
1727
+ )
1728
+ with gr.Column():
1729
+ file_index3 = gr.Textbox(
1730
+ label=i18n("特征检索库文件路径,为空则使用下拉的选择结果"),
1731
+ value="",
1732
+ interactive=True,
1733
+ )
1734
+ file_index4 = gr.Dropdown(
1735
+ label=i18n("自动检测index路径,下拉式选择(dropdown)"),
1736
+ choices=sorted(index_paths),
1737
+ interactive=True,
1738
+ )
1739
+ refresh_button.click(
1740
+ fn=lambda: change_choices()[1],
1741
+ inputs=[],
1742
+ outputs=file_index4,
1743
+ )
1744
+ # file_big_npy2 = gr.Textbox(
1745
+ # label=i18n("特征文件路径"),
1746
+ # value="E:\\codes\\py39\\vits_vc_gpu_train\\logs\\mi-test-1key\\total_fea.npy",
1747
+ # interactive=True,
1748
+ # )
1749
+ index_rate2 = gr.Slider(
1750
+ minimum=0,
1751
+ maximum=1,
1752
+ label=i18n("检索特征占比"),
1753
+ value=1,
1754
+ interactive=True,
1755
+ )
1756
+ with gr.Column():
1757
+ resample_sr1 = gr.Slider(
1758
+ minimum=0,
1759
+ maximum=48000,
1760
+ label=i18n("后处理重采样至最终采样率,0为不进行重采样"),
1761
+ value=0,
1762
+ step=1,
1763
+ interactive=True,
1764
+ )
1765
+ rms_mix_rate1 = gr.Slider(
1766
+ minimum=0,
1767
+ maximum=1,
1768
+ label=i18n("输入源音量包络替换输出音量包络融合比例,越靠近1越使用输出包络"),
1769
+ value=1,
1770
+ interactive=True,
1771
+ )
1772
+ protect1 = gr.Slider(
1773
+ minimum=0,
1774
+ maximum=0.5,
1775
+ label=i18n(
1776
+ "保护清辅音和呼吸声,防止电音撕裂等artifact,拉满0.5不开启,调低加大保护力度但可能降低索引效果"
1777
+ ),
1778
+ value=0.33,
1779
+ step=0.01,
1780
+ interactive=True,
1781
+ )
1782
+ with gr.Column():
1783
+ dir_input = gr.Textbox(
1784
+ label=i18n("输入待处理音频文件夹路径(去文件管理器地址栏拷就行了)"),
1785
+ value="E:\codes\py39\\test-20230416b\\todo-songs",
1786
+ )
1787
+ inputs = gr.File(
1788
+ file_count="multiple", label=i18n("也可批量输入音频文件, 二选一, 优先读文件夹")
1789
+ )
1790
+ with gr.Row():
1791
+ format1 = gr.Radio(
1792
+ label=i18n("导出文件格式"),
1793
+ choices=["wav", "flac", "mp3", "m4a"],
1794
+ value="flac",
1795
+ interactive=True,
1796
+ )
1797
+ but1 = gr.Button(i18n("转换"), variant="primary")
1798
+ vc_output3 = gr.Textbox(label=i18n("输出信息"))
1799
+ but1.click(
1800
+ vc_multi,
1801
+ [
1802
+ spk_item,
1803
+ dir_input,
1804
+ opt_input,
1805
+ inputs,
1806
+ vc_transform1,
1807
+ f0method1,
1808
+ file_index3,
1809
+ file_index4,
1810
+ # file_big_npy2,
1811
+ index_rate2,
1812
+ filter_radius1,
1813
+ resample_sr1,
1814
+ rms_mix_rate1,
1815
+ protect1,
1816
+ format1,
1817
+ crepe_hop_length,
1818
+ ],
1819
+ [vc_output3],
1820
+ )
1821
+ but1.click(fn=lambda: easy_uploader.clear())
1822
+ with gr.TabItem("Download Model"):
1823
+ with gr.Row():
1824
+ url=gr.Textbox(label="Enter the URL to the Model:")
1825
+ with gr.Row():
1826
+ model = gr.Textbox(label="Name your model:")
1827
+ download_button=gr.Button("Download")
1828
+ with gr.Row():
1829
+ status_bar=gr.Textbox(label="")
1830
+ download_button.click(fn=download_from_url, inputs=[url, model], outputs=[status_bar])
1831
+ with gr.Row():
1832
+ gr.Markdown(
1833
+ """
1834
+ Made with ❤️ by [Alice Oliveira](https://github.com/aliceoq) | Hosted with ❤️ by [Mateus Elias](https://github.com/mateuseap)
1835
+ """
1836
+ )
1837
+
1838
+ def has_two_files_in_pretrained_folder():
1839
+ pretrained_folder = "./pretrained/"
1840
+ if not os.path.exists(pretrained_folder):
1841
+ return False
1842
+
1843
+ files_in_folder = os.listdir(pretrained_folder)
1844
+ num_files = len(files_in_folder)
1845
+ return num_files >= 2
1846
+
1847
+ if has_two_files_in_pretrained_folder():
1848
+ print("Pretrained weights are downloaded. Training tab enabled!\n-------------------------------")
1849
+ with gr.TabItem("Train", visible=False):
1850
+ with gr.Row():
1851
+ with gr.Column():
1852
+ exp_dir1 = gr.Textbox(label="Voice Name:", value="My-Voice")
1853
+ sr2 = gr.Radio(
1854
+ label=i18n("目标采样率"),
1855
+ choices=["40k", "48k"],
1856
+ value="40k",
1857
+ interactive=True,
1858
+ visible=False
1859
+ )
1860
+ if_f0_3 = gr.Radio(
1861
+ label=i18n("模型是否带音高指导(唱歌一定要, 语音可以不要)"),
1862
+ choices=[True, False],
1863
+ value=True,
1864
+ interactive=True,
1865
+ visible=False
1866
+ )
1867
+ version19 = gr.Radio(
1868
+ label="RVC version",
1869
+ choices=["v1", "v2"],
1870
+ value="v2",
1871
+ interactive=True,
1872
+ visible=False,
1873
+ )
1874
+ np7 = gr.Slider(
1875
+ minimum=0,
1876
+ maximum=config.n_cpu,
1877
+ step=1,
1878
+ label="# of CPUs for data processing (Leave as it is)",
1879
+ value=config.n_cpu,
1880
+ interactive=True,
1881
+ visible=True
1882
+ )
1883
+ trainset_dir4 = gr.Textbox(label="Path to your dataset (audios, not zip):", value="./dataset")
1884
+ easy_uploader = gr.Files(label='OR Drop your audios here. They will be uploaded in your dataset path above.',file_types=['audio'])
1885
+ but1 = gr.Button("1. Process The Dataset", variant="primary")
1886
+ info1 = gr.Textbox(label="Status (wait until it says 'end preprocess'):", value="")
1887
+ easy_uploader.upload(fn=upload_to_dataset, inputs=[easy_uploader, trainset_dir4], outputs=[info1])
1888
+ but1.click(
1889
+ preprocess_dataset, [trainset_dir4, exp_dir1, sr2, np7], [info1]
1890
+ )
1891
+ with gr.Column():
1892
+ spk_id5 = gr.Slider(
1893
+ minimum=0,
1894
+ maximum=4,
1895
+ step=1,
1896
+ label=i18n("请指定说话人id"),
1897
+ value=0,
1898
+ interactive=True,
1899
+ visible=False
1900
+ )
1901
+ with gr.Accordion('GPU Settings', open=False, visible=False):
1902
+ gpus6 = gr.Textbox(
1903
+ label=i18n("以-分隔输入使用的卡号, 例如 0-1-2 使用卡0和卡1和卡2"),
1904
+ value=gpus,
1905
+ interactive=True,
1906
+ visible=False
1907
+ )
1908
+ gpu_info9 = gr.Textbox(label=i18n("显卡信息"), value=gpu_info)
1909
+ f0method8 = gr.Radio(
1910
+ label=i18n(
1911
+ "选择音高提取算法:输入歌声可用pm提速,高质量语音但CPU差可用dio提速,harvest质量更好但慢"
1912
+ ),
1913
+ choices=["harvest","crepe", "mangio-crepe", "rmvpe"], # Fork feature: Crepe on f0 extraction for training.
1914
+ value="rmvpe",
1915
+ interactive=True,
1916
+ )
1917
+
1918
+ extraction_crepe_hop_length = gr.Slider(
1919
+ minimum=1,
1920
+ maximum=512,
1921
+ step=1,
1922
+ label=i18n("crepe_hop_length"),
1923
+ value=128,
1924
+ interactive=True,
1925
+ visible=False,
1926
+ )
1927
+ f0method8.change(fn=whethercrepeornah, inputs=[f0method8], outputs=[extraction_crepe_hop_length])
1928
+ but2 = gr.Button("2. Pitch Extraction", variant="primary")
1929
+ info2 = gr.Textbox(label="Status(Check the Colab Notebook's cell output):", value="", max_lines=8)
1930
+ but2.click(
1931
+ extract_f0_feature,
1932
+ [gpus6, np7, f0method8, if_f0_3, exp_dir1, version19, extraction_crepe_hop_length],
1933
+ [info2],
1934
+ )
1935
+ with gr.Row():
1936
+ with gr.Column():
1937
+ total_epoch11 = gr.Slider(
1938
+ minimum=1,
1939
+ maximum=5000,
1940
+ step=10,
1941
+ label="Total # of training epochs (IF you choose a value too high, your model will sound horribly overtrained.):",
1942
+ value=250,
1943
+ interactive=True,
1944
+ )
1945
+ butstop = gr.Button(
1946
+ "Stop Training",
1947
+ variant='primary',
1948
+ visible=False,
1949
+ )
1950
+ but3 = gr.Button("3. Train Model", variant="primary", visible=True)
1951
+
1952
+ but3.click(fn=stoptraining, inputs=[gr.Number(value=0, visible=False)], outputs=[but3, butstop])
1953
+ butstop.click(fn=stoptraining, inputs=[gr.Number(value=1, visible=False)], outputs=[butstop, but3])
1954
+
1955
+
1956
+ but4 = gr.Button("4.Train Index", variant="primary")
1957
+ info3 = gr.Textbox(label="Status(Check the Colab Notebook's cell output):", value="", max_lines=10)
1958
+ with gr.Accordion("Training Preferences (You can leave these as they are)", open=False):
1959
+ #gr.Markdown(value=i18n("step3: 填写训练设置, 开始训练模型和索引"))
1960
+ with gr.Column():
1961
+ save_epoch10 = gr.Slider(
1962
+ minimum=1,
1963
+ maximum=200,
1964
+ step=1,
1965
+ label="Backup every X amount of epochs:",
1966
+ value=10,
1967
+ interactive=True,
1968
+ )
1969
+ batch_size12 = gr.Slider(
1970
+ minimum=1,
1971
+ maximum=40,
1972
+ step=1,
1973
+ label="Batch Size (LEAVE IT unless you know what you're doing!):",
1974
+ value=default_batch_size,
1975
+ interactive=True,
1976
+ )
1977
+ if_save_latest13 = gr.Checkbox(
1978
+ label="Save only the latest '.ckpt' file to save disk space.",
1979
+ value=True,
1980
+ interactive=True,
1981
+ )
1982
+ if_cache_gpu17 = gr.Checkbox(
1983
+ label="Cache all training sets to GPU memory. Caching small datasets (less than 10 minutes) can speed up training, but caching large datasets will consume a lot of GPU memory and may not provide much speed improvement.",
1984
+ value=False,
1985
+ interactive=True,
1986
+ )
1987
+ if_save_every_weights18 = gr.Checkbox(
1988
+ label="Save a small final model to the 'weights' folder at each save point.",
1989
+ value=True,
1990
+ interactive=True,
1991
+ )
1992
+ zip_model = gr.Button('5. Download Model')
1993
+ zipped_model = gr.Files(label='Your Model and Index file can be downloaded here:')
1994
+ zip_model.click(fn=zip_downloader, inputs=[exp_dir1], outputs=[zipped_model, info3])
1995
+ with gr.Group():
1996
+ with gr.Accordion("Base Model Locations:", open=False, visible=False):
1997
+ pretrained_G14 = gr.Textbox(
1998
+ label=i18n("加载预训练底模G路径"),
1999
+ value="pretrained_v2/f0G40k.pth",
2000
+ interactive=True,
2001
+ )
2002
+ pretrained_D15 = gr.Textbox(
2003
+ label=i18n("加载预训练底模D路径"),
2004
+ value="pretrained_v2/f0D40k.pth",
2005
+ interactive=True,
2006
+ )
2007
+ gpus16 = gr.Textbox(
2008
+ label=i18n("以-分隔输入使用的卡号, 例如 0-1-2 使用卡0和卡1和卡2"),
2009
+ value=gpus,
2010
+ interactive=True,
2011
+ )
2012
+ sr2.change(
2013
+ change_sr2,
2014
+ [sr2, if_f0_3, version19],
2015
+ [pretrained_G14, pretrained_D15, version19],
2016
+ )
2017
+ version19.change(
2018
+ change_version19,
2019
+ [sr2, if_f0_3, version19],
2020
+ [pretrained_G14, pretrained_D15],
2021
+ )
2022
+ if_f0_3.change(
2023
+ change_f0,
2024
+ [if_f0_3, sr2, version19],
2025
+ [f0method8, pretrained_G14, pretrained_D15],
2026
+ )
2027
+ but5 = gr.Button(i18n("一键训练"), variant="primary", visible=False)
2028
+ but3.click(
2029
+ click_train,
2030
+ [
2031
+ exp_dir1,
2032
+ sr2,
2033
+ if_f0_3,
2034
+ spk_id5,
2035
+ save_epoch10,
2036
+ total_epoch11,
2037
+ batch_size12,
2038
+ if_save_latest13,
2039
+ pretrained_G14,
2040
+ pretrained_D15,
2041
+ gpus16,
2042
+ if_cache_gpu17,
2043
+ if_save_every_weights18,
2044
+ version19,
2045
+ ],
2046
+ [
2047
+ info3,
2048
+ butstop,
2049
+ but3,
2050
+ ],
2051
+ )
2052
+ but4.click(train_index, [exp_dir1, version19], info3)
2053
+ but5.click(
2054
+ train1key,
2055
+ [
2056
+ exp_dir1,
2057
+ sr2,
2058
+ if_f0_3,
2059
+ trainset_dir4,
2060
+ spk_id5,
2061
+ np7,
2062
+ f0method8,
2063
+ save_epoch10,
2064
+ total_epoch11,
2065
+ batch_size12,
2066
+ if_save_latest13,
2067
+ pretrained_G14,
2068
+ pretrained_D15,
2069
+ gpus16,
2070
+ if_cache_gpu17,
2071
+ if_save_every_weights18,
2072
+ version19,
2073
+ extraction_crepe_hop_length
2074
+ ],
2075
+ info3,
2076
+ )
2077
+
2078
+ else:
2079
+ print(
2080
+ "Pretrained weights not downloaded. Disabling training tab.\n"
2081
+ "Wondering how to train a voice? Visit here for the RVC model training guide: https://t.ly/RVC_Training_Guide\n"
2082
+ "-------------------------------\n"
2083
+ )
2084
+
2085
+ app.queue(concurrency_count=511, max_size=1022).launch(share=False, quiet=True)
2086
+ #endregion
config.py ADDED
@@ -0,0 +1,204 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import sys
3
+ import torch
4
+ import json
5
+ from multiprocessing import cpu_count
6
+
7
+ global usefp16
8
+ usefp16 = False
9
+
10
+
11
+ def use_fp32_config():
12
+ usefp16 = False
13
+ device_capability = 0
14
+ if torch.cuda.is_available():
15
+ device = torch.device("cuda:0") # Assuming you have only one GPU (index 0).
16
+ device_capability = torch.cuda.get_device_capability(device)[0]
17
+ if device_capability >= 7:
18
+ usefp16 = True
19
+ for config_file in ["32k.json", "40k.json", "48k.json"]:
20
+ with open(f"configs/{config_file}", "r") as d:
21
+ data = json.load(d)
22
+
23
+ if "train" in data and "fp16_run" in data["train"]:
24
+ data["train"]["fp16_run"] = True
25
+
26
+ with open(f"configs/{config_file}", "w") as d:
27
+ json.dump(data, d, indent=4)
28
+
29
+ print(f"Set fp16_run to true in {config_file}")
30
+
31
+ with open(
32
+ "trainset_preprocess_pipeline_print.py", "r", encoding="utf-8"
33
+ ) as f:
34
+ strr = f.read()
35
+
36
+ strr = strr.replace("3.0", "3.7")
37
+
38
+ with open(
39
+ "trainset_preprocess_pipeline_print.py", "w", encoding="utf-8"
40
+ ) as f:
41
+ f.write(strr)
42
+ else:
43
+ for config_file in ["32k.json", "40k.json", "48k.json"]:
44
+ with open(f"configs/{config_file}", "r") as f:
45
+ data = json.load(f)
46
+
47
+ if "train" in data and "fp16_run" in data["train"]:
48
+ data["train"]["fp16_run"] = False
49
+
50
+ with open(f"configs/{config_file}", "w") as d:
51
+ json.dump(data, d, indent=4)
52
+
53
+ print(f"Set fp16_run to false in {config_file}")
54
+
55
+ with open(
56
+ "trainset_preprocess_pipeline_print.py", "r", encoding="utf-8"
57
+ ) as f:
58
+ strr = f.read()
59
+
60
+ strr = strr.replace("3.7", "3.0")
61
+
62
+ with open(
63
+ "trainset_preprocess_pipeline_print.py", "w", encoding="utf-8"
64
+ ) as f:
65
+ f.write(strr)
66
+ else:
67
+ print(
68
+ "CUDA is not available. Make sure you have an NVIDIA GPU and CUDA installed."
69
+ )
70
+ return (usefp16, device_capability)
71
+
72
+
73
+ class Config:
74
+ def __init__(self):
75
+ self.device = "cuda:0"
76
+ self.is_half = True
77
+ self.n_cpu = 0
78
+ self.gpu_name = None
79
+ self.gpu_mem = None
80
+ (
81
+ self.python_cmd,
82
+ self.listen_port,
83
+ self.iscolab,
84
+ self.noparallel,
85
+ self.noautoopen,
86
+ self.paperspace,
87
+ self.is_cli,
88
+ ) = self.arg_parse()
89
+
90
+ self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()
91
+
92
+ @staticmethod
93
+ def arg_parse() -> tuple:
94
+ exe = sys.executable or "python"
95
+ parser = argparse.ArgumentParser()
96
+ parser.add_argument("--port", type=int, default=7865, help="Listen port")
97
+ parser.add_argument("--pycmd", type=str, default=exe, help="Python command")
98
+ parser.add_argument("--colab", action="store_true", help="Launch in colab")
99
+ parser.add_argument(
100
+ "--noparallel", action="store_true", help="Disable parallel processing"
101
+ )
102
+ parser.add_argument(
103
+ "--noautoopen",
104
+ action="store_true",
105
+ help="Do not open in browser automatically",
106
+ )
107
+ parser.add_argument( # Fork Feature. Paperspace integration for web UI
108
+ "--paperspace",
109
+ action="store_true",
110
+ help="Note that this argument just shares a gradio link for the web UI. Thus can be used on other non-local CLI systems.",
111
+ )
112
+ parser.add_argument( # Fork Feature. Embed a CLI into the infer-web.py
113
+ "--is_cli",
114
+ action="store_true",
115
+ help="Use the CLI instead of setting up a gradio UI. This flag will launch an RVC text interface where you can execute functions from infer-web.py!",
116
+ )
117
+ cmd_opts = parser.parse_args()
118
+
119
+ cmd_opts.port = cmd_opts.port if 0 <= cmd_opts.port <= 65535 else 7865
120
+
121
+ return (
122
+ cmd_opts.pycmd,
123
+ cmd_opts.port,
124
+ cmd_opts.colab,
125
+ cmd_opts.noparallel,
126
+ cmd_opts.noautoopen,
127
+ cmd_opts.paperspace,
128
+ cmd_opts.is_cli,
129
+ )
130
+
131
+ # has_mps is only available in nightly pytorch (for now) and MasOS 12.3+.
132
+ # check `getattr` and try it for compatibility
133
+ @staticmethod
134
+ def has_mps() -> bool:
135
+ if not torch.backends.mps.is_available():
136
+ return False
137
+ try:
138
+ torch.zeros(1).to(torch.device("mps"))
139
+ return True
140
+ except Exception:
141
+ return False
142
+
143
+ def device_config(self) -> tuple:
144
+ if torch.cuda.is_available():
145
+ i_device = int(self.device.split(":")[-1])
146
+ self.gpu_name = torch.cuda.get_device_name(i_device)
147
+ if (
148
+ ("16" in self.gpu_name and "V100" not in self.gpu_name.upper())
149
+ or "P40" in self.gpu_name.upper()
150
+ or "1060" in self.gpu_name
151
+ or "1070" in self.gpu_name
152
+ or "1080" in self.gpu_name
153
+ ):
154
+ print("Found GPU", self.gpu_name, ", force to fp32")
155
+ self.is_half = False
156
+ else:
157
+ print("Found GPU", self.gpu_name)
158
+ use_fp32_config()
159
+ self.gpu_mem = int(
160
+ torch.cuda.get_device_properties(i_device).total_memory
161
+ / 1024
162
+ / 1024
163
+ / 1024
164
+ + 0.4
165
+ )
166
+ if self.gpu_mem <= 4:
167
+ with open("trainset_preprocess_pipeline_print.py", "r") as f:
168
+ strr = f.read().replace("3.7", "3.0")
169
+ with open("trainset_preprocess_pipeline_print.py", "w") as f:
170
+ f.write(strr)
171
+ elif self.has_mps():
172
+ print("No supported Nvidia GPU found, use MPS instead")
173
+ self.device = "mps"
174
+ self.is_half = False
175
+ use_fp32_config()
176
+ else:
177
+ print("No supported Nvidia GPU found, use CPU instead")
178
+ self.device = "cpu"
179
+ self.is_half = False
180
+ use_fp32_config()
181
+
182
+ if self.n_cpu == 0:
183
+ self.n_cpu = cpu_count()
184
+
185
+ if self.is_half:
186
+ # 6G显存配置
187
+ x_pad = 3
188
+ x_query = 10
189
+ x_center = 60
190
+ x_max = 65
191
+ else:
192
+ # 5G显存配置
193
+ x_pad = 1
194
+ x_query = 6
195
+ x_center = 38
196
+ x_max = 41
197
+
198
+ if self.gpu_mem != None and self.gpu_mem <= 4:
199
+ x_pad = 1
200
+ x_query = 5
201
+ x_center = 30
202
+ x_max = 32
203
+
204
+ return x_pad, x_query, x_center, x_max
gitattributes.txt ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
gitignore.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ __pycache__/
2
+ weights/
3
+ TEMP/
4
+ logs/
5
+ csvdb/
6
+
7
+ # Environment
8
+ venv/
9
+
10
+ # Models
11
+ hubert_base.pt
12
+ rmvpe.pt
i18n.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import locale
2
+ import json
3
+ import os
4
+
5
+
6
+ def load_language_list(language):
7
+ with open(f"./i18n/{language}.json", "r", encoding="utf-8") as f:
8
+ language_list = json.load(f)
9
+ return language_list
10
+
11
+
12
+ class I18nAuto:
13
+ def __init__(self, language=None):
14
+ if language in ["Auto", None]:
15
+ language = locale.getdefaultlocale()[
16
+ 0
17
+ ] # getlocale can't identify the system's language ((None, None))
18
+ if not os.path.exists(f"./i18n/{language}.json"):
19
+ language = "en_US"
20
+ self.language = language
21
+ # print("Use Language:", language)
22
+ self.language_map = load_language_list(language)
23
+
24
+ def __call__(self, key):
25
+ return self.language_map.get(key, key)
26
+
27
+ def print(self):
28
+ print("Use Language:", self.language)
packages.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ build-essential
2
+ ffmpeg
3
+ aria2
requirements.txt ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ gTTS
2
+ elevenlabs
3
+ stftpitchshift==1.5.1
4
+ torchcrepe
5
+ setuptools
6
+ wheel
7
+ httpx==0.23.0
8
+ faiss-gpu
9
+ fairseq
10
+ gradio==3.34.0
11
+ ffmpeg-python
12
+ praat-parselmouth
13
+ pyworld
14
+ numpy==1.23.5
15
+ i18n
16
+ numba==0.56.4
17
+ librosa==0.9.2
18
+ mega.py
19
+ gdown
20
+ onnxruntime
21
+ pyngrok==4.1.12
22
+ torch
rmvpe.py ADDED
@@ -0,0 +1,432 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import sys, torch, numpy as np, traceback, pdb
2
+ import torch.nn as nn
3
+ from time import time as ttime
4
+ import torch.nn.functional as F
5
+
6
+
7
+ class BiGRU(nn.Module):
8
+ def __init__(self, input_features, hidden_features, num_layers):
9
+ super(BiGRU, self).__init__()
10
+ self.gru = nn.GRU(
11
+ input_features,
12
+ hidden_features,
13
+ num_layers=num_layers,
14
+ batch_first=True,
15
+ bidirectional=True,
16
+ )
17
+
18
+ def forward(self, x):
19
+ return self.gru(x)[0]
20
+
21
+
22
+ class ConvBlockRes(nn.Module):
23
+ def __init__(self, in_channels, out_channels, momentum=0.01):
24
+ super(ConvBlockRes, self).__init__()
25
+ self.conv = nn.Sequential(
26
+ nn.Conv2d(
27
+ in_channels=in_channels,
28
+ out_channels=out_channels,
29
+ kernel_size=(3, 3),
30
+ stride=(1, 1),
31
+ padding=(1, 1),
32
+ bias=False,
33
+ ),
34
+ nn.BatchNorm2d(out_channels, momentum=momentum),
35
+ nn.ReLU(),
36
+ nn.Conv2d(
37
+ in_channels=out_channels,
38
+ out_channels=out_channels,
39
+ kernel_size=(3, 3),
40
+ stride=(1, 1),
41
+ padding=(1, 1),
42
+ bias=False,
43
+ ),
44
+ nn.BatchNorm2d(out_channels, momentum=momentum),
45
+ nn.ReLU(),
46
+ )
47
+ if in_channels != out_channels:
48
+ self.shortcut = nn.Conv2d(in_channels, out_channels, (1, 1))
49
+ self.is_shortcut = True
50
+ else:
51
+ self.is_shortcut = False
52
+
53
+ def forward(self, x):
54
+ if self.is_shortcut:
55
+ return self.conv(x) + self.shortcut(x)
56
+ else:
57
+ return self.conv(x) + x
58
+
59
+
60
+ class Encoder(nn.Module):
61
+ def __init__(
62
+ self,
63
+ in_channels,
64
+ in_size,
65
+ n_encoders,
66
+ kernel_size,
67
+ n_blocks,
68
+ out_channels=16,
69
+ momentum=0.01,
70
+ ):
71
+ super(Encoder, self).__init__()
72
+ self.n_encoders = n_encoders
73
+ self.bn = nn.BatchNorm2d(in_channels, momentum=momentum)
74
+ self.layers = nn.ModuleList()
75
+ self.latent_channels = []
76
+ for i in range(self.n_encoders):
77
+ self.layers.append(
78
+ ResEncoderBlock(
79
+ in_channels, out_channels, kernel_size, n_blocks, momentum=momentum
80
+ )
81
+ )
82
+ self.latent_channels.append([out_channels, in_size])
83
+ in_channels = out_channels
84
+ out_channels *= 2
85
+ in_size //= 2
86
+ self.out_size = in_size
87
+ self.out_channel = out_channels
88
+
89
+ def forward(self, x):
90
+ concat_tensors = []
91
+ x = self.bn(x)
92
+ for i in range(self.n_encoders):
93
+ _, x = self.layers[i](x)
94
+ concat_tensors.append(_)
95
+ return x, concat_tensors
96
+
97
+
98
+ class ResEncoderBlock(nn.Module):
99
+ def __init__(
100
+ self, in_channels, out_channels, kernel_size, n_blocks=1, momentum=0.01
101
+ ):
102
+ super(ResEncoderBlock, self).__init__()
103
+ self.n_blocks = n_blocks
104
+ self.conv = nn.ModuleList()
105
+ self.conv.append(ConvBlockRes(in_channels, out_channels, momentum))
106
+ for i in range(n_blocks - 1):
107
+ self.conv.append(ConvBlockRes(out_channels, out_channels, momentum))
108
+ self.kernel_size = kernel_size
109
+ if self.kernel_size is not None:
110
+ self.pool = nn.AvgPool2d(kernel_size=kernel_size)
111
+
112
+ def forward(self, x):
113
+ for i in range(self.n_blocks):
114
+ x = self.conv[i](x)
115
+ if self.kernel_size is not None:
116
+ return x, self.pool(x)
117
+ else:
118
+ return x
119
+
120
+
121
+ class Intermediate(nn.Module): #
122
+ def __init__(self, in_channels, out_channels, n_inters, n_blocks, momentum=0.01):
123
+ super(Intermediate, self).__init__()
124
+ self.n_inters = n_inters
125
+ self.layers = nn.ModuleList()
126
+ self.layers.append(
127
+ ResEncoderBlock(in_channels, out_channels, None, n_blocks, momentum)
128
+ )
129
+ for i in range(self.n_inters - 1):
130
+ self.layers.append(
131
+ ResEncoderBlock(out_channels, out_channels, None, n_blocks, momentum)
132
+ )
133
+
134
+ def forward(self, x):
135
+ for i in range(self.n_inters):
136
+ x = self.layers[i](x)
137
+ return x
138
+
139
+
140
+ class ResDecoderBlock(nn.Module):
141
+ def __init__(self, in_channels, out_channels, stride, n_blocks=1, momentum=0.01):
142
+ super(ResDecoderBlock, self).__init__()
143
+ out_padding = (0, 1) if stride == (1, 2) else (1, 1)
144
+ self.n_blocks = n_blocks
145
+ self.conv1 = nn.Sequential(
146
+ nn.ConvTranspose2d(
147
+ in_channels=in_channels,
148
+ out_channels=out_channels,
149
+ kernel_size=(3, 3),
150
+ stride=stride,
151
+ padding=(1, 1),
152
+ output_padding=out_padding,
153
+ bias=False,
154
+ ),
155
+ nn.BatchNorm2d(out_channels, momentum=momentum),
156
+ nn.ReLU(),
157
+ )
158
+ self.conv2 = nn.ModuleList()
159
+ self.conv2.append(ConvBlockRes(out_channels * 2, out_channels, momentum))
160
+ for i in range(n_blocks - 1):
161
+ self.conv2.append(ConvBlockRes(out_channels, out_channels, momentum))
162
+
163
+ def forward(self, x, concat_tensor):
164
+ x = self.conv1(x)
165
+ x = torch.cat((x, concat_tensor), dim=1)
166
+ for i in range(self.n_blocks):
167
+ x = self.conv2[i](x)
168
+ return x
169
+
170
+
171
+ class Decoder(nn.Module):
172
+ def __init__(self, in_channels, n_decoders, stride, n_blocks, momentum=0.01):
173
+ super(Decoder, self).__init__()
174
+ self.layers = nn.ModuleList()
175
+ self.n_decoders = n_decoders
176
+ for i in range(self.n_decoders):
177
+ out_channels = in_channels // 2
178
+ self.layers.append(
179
+ ResDecoderBlock(in_channels, out_channels, stride, n_blocks, momentum)
180
+ )
181
+ in_channels = out_channels
182
+
183
+ def forward(self, x, concat_tensors):
184
+ for i in range(self.n_decoders):
185
+ x = self.layers[i](x, concat_tensors[-1 - i])
186
+ return x
187
+
188
+
189
+ class DeepUnet(nn.Module):
190
+ def __init__(
191
+ self,
192
+ kernel_size,
193
+ n_blocks,
194
+ en_de_layers=5,
195
+ inter_layers=4,
196
+ in_channels=1,
197
+ en_out_channels=16,
198
+ ):
199
+ super(DeepUnet, self).__init__()
200
+ self.encoder = Encoder(
201
+ in_channels, 128, en_de_layers, kernel_size, n_blocks, en_out_channels
202
+ )
203
+ self.intermediate = Intermediate(
204
+ self.encoder.out_channel // 2,
205
+ self.encoder.out_channel,
206
+ inter_layers,
207
+ n_blocks,
208
+ )
209
+ self.decoder = Decoder(
210
+ self.encoder.out_channel, en_de_layers, kernel_size, n_blocks
211
+ )
212
+
213
+ def forward(self, x):
214
+ x, concat_tensors = self.encoder(x)
215
+ x = self.intermediate(x)
216
+ x = self.decoder(x, concat_tensors)
217
+ return x
218
+
219
+
220
+ class E2E(nn.Module):
221
+ def __init__(
222
+ self,
223
+ n_blocks,
224
+ n_gru,
225
+ kernel_size,
226
+ en_de_layers=5,
227
+ inter_layers=4,
228
+ in_channels=1,
229
+ en_out_channels=16,
230
+ ):
231
+ super(E2E, self).__init__()
232
+ self.unet = DeepUnet(
233
+ kernel_size,
234
+ n_blocks,
235
+ en_de_layers,
236
+ inter_layers,
237
+ in_channels,
238
+ en_out_channels,
239
+ )
240
+ self.cnn = nn.Conv2d(en_out_channels, 3, (3, 3), padding=(1, 1))
241
+ if n_gru:
242
+ self.fc = nn.Sequential(
243
+ BiGRU(3 * 128, 256, n_gru),
244
+ nn.Linear(512, 360),
245
+ nn.Dropout(0.25),
246
+ nn.Sigmoid(),
247
+ )
248
+ else:
249
+ self.fc = nn.Sequential(
250
+ nn.Linear(3 * N_MELS, N_CLASS), nn.Dropout(0.25), nn.Sigmoid()
251
+ )
252
+
253
+ def forward(self, mel):
254
+ mel = mel.transpose(-1, -2).unsqueeze(1)
255
+ x = self.cnn(self.unet(mel)).transpose(1, 2).flatten(-2)
256
+ x = self.fc(x)
257
+ return x
258
+
259
+
260
+ from librosa.filters import mel
261
+
262
+
263
+ class MelSpectrogram(torch.nn.Module):
264
+ def __init__(
265
+ self,
266
+ is_half,
267
+ n_mel_channels,
268
+ sampling_rate,
269
+ win_length,
270
+ hop_length,
271
+ n_fft=None,
272
+ mel_fmin=0,
273
+ mel_fmax=None,
274
+ clamp=1e-5,
275
+ ):
276
+ super().__init__()
277
+ n_fft = win_length if n_fft is None else n_fft
278
+ self.hann_window = {}
279
+ mel_basis = mel(
280
+ sr=sampling_rate,
281
+ n_fft=n_fft,
282
+ n_mels=n_mel_channels,
283
+ fmin=mel_fmin,
284
+ fmax=mel_fmax,
285
+ htk=True,
286
+ )
287
+ mel_basis = torch.from_numpy(mel_basis).float()
288
+ self.register_buffer("mel_basis", mel_basis)
289
+ self.n_fft = win_length if n_fft is None else n_fft
290
+ self.hop_length = hop_length
291
+ self.win_length = win_length
292
+ self.sampling_rate = sampling_rate
293
+ self.n_mel_channels = n_mel_channels
294
+ self.clamp = clamp
295
+ self.is_half = is_half
296
+
297
+ def forward(self, audio, keyshift=0, speed=1, center=True):
298
+ factor = 2 ** (keyshift / 12)
299
+ n_fft_new = int(np.round(self.n_fft * factor))
300
+ win_length_new = int(np.round(self.win_length * factor))
301
+ hop_length_new = int(np.round(self.hop_length * speed))
302
+ keyshift_key = str(keyshift) + "_" + str(audio.device)
303
+ if keyshift_key not in self.hann_window:
304
+ self.hann_window[keyshift_key] = torch.hann_window(win_length_new).to(
305
+ audio.device
306
+ )
307
+ fft = torch.stft(
308
+ audio,
309
+ n_fft=n_fft_new,
310
+ hop_length=hop_length_new,
311
+ win_length=win_length_new,
312
+ window=self.hann_window[keyshift_key],
313
+ center=center,
314
+ return_complex=True,
315
+ )
316
+ magnitude = torch.sqrt(fft.real.pow(2) + fft.imag.pow(2))
317
+ if keyshift != 0:
318
+ size = self.n_fft // 2 + 1
319
+ resize = magnitude.size(1)
320
+ if resize < size:
321
+ magnitude = F.pad(magnitude, (0, 0, 0, size - resize))
322
+ magnitude = magnitude[:, :size, :] * self.win_length / win_length_new
323
+ mel_output = torch.matmul(self.mel_basis, magnitude)
324
+ if self.is_half == True:
325
+ mel_output = mel_output.half()
326
+ log_mel_spec = torch.log(torch.clamp(mel_output, min=self.clamp))
327
+ return log_mel_spec
328
+
329
+
330
+ class RMVPE:
331
+ def __init__(self, model_path, is_half, device=None):
332
+ self.resample_kernel = {}
333
+ model = E2E(4, 1, (2, 2))
334
+ ckpt = torch.load(model_path, map_location="cpu")
335
+ model.load_state_dict(ckpt)
336
+ model.eval()
337
+ if is_half == True:
338
+ model = model.half()
339
+ self.model = model
340
+ self.resample_kernel = {}
341
+ self.is_half = is_half
342
+ if device is None:
343
+ device = "cuda" if torch.cuda.is_available() else "cpu"
344
+ self.device = device
345
+ self.mel_extractor = MelSpectrogram(
346
+ is_half, 128, 16000, 1024, 160, None, 30, 8000
347
+ ).to(device)
348
+ self.model = self.model.to(device)
349
+ cents_mapping = 20 * np.arange(360) + 1997.3794084376191
350
+ self.cents_mapping = np.pad(cents_mapping, (4, 4)) # 368
351
+
352
+ def mel2hidden(self, mel):
353
+ with torch.no_grad():
354
+ n_frames = mel.shape[-1]
355
+ mel = F.pad(
356
+ mel, (0, 32 * ((n_frames - 1) // 32 + 1) - n_frames), mode="reflect"
357
+ )
358
+ hidden = self.model(mel)
359
+ return hidden[:, :n_frames]
360
+
361
+ def decode(self, hidden, thred=0.03):
362
+ cents_pred = self.to_local_average_cents(hidden, thred=thred)
363
+ f0 = 10 * (2 ** (cents_pred / 1200))
364
+ f0[f0 == 10] = 0
365
+ # f0 = np.array([10 * (2 ** (cent_pred / 1200)) if cent_pred else 0 for cent_pred in cents_pred])
366
+ return f0
367
+
368
+ def infer_from_audio(self, audio, thred=0.03):
369
+ audio = torch.from_numpy(audio).float().to(self.device).unsqueeze(0)
370
+ # torch.cuda.synchronize()
371
+ # t0=ttime()
372
+ mel = self.mel_extractor(audio, center=True)
373
+ # torch.cuda.synchronize()
374
+ # t1=ttime()
375
+ hidden = self.mel2hidden(mel)
376
+ # torch.cuda.synchronize()
377
+ # t2=ttime()
378
+ hidden = hidden.squeeze(0).cpu().numpy()
379
+ if self.is_half == True:
380
+ hidden = hidden.astype("float32")
381
+ f0 = self.decode(hidden, thred=thred)
382
+ # torch.cuda.synchronize()
383
+ # t3=ttime()
384
+ # print("hmvpe:%s\t%s\t%s\t%s"%(t1-t0,t2-t1,t3-t2,t3-t0))
385
+ return f0
386
+
387
+ def to_local_average_cents(self, salience, thred=0.05):
388
+ # t0 = ttime()
389
+ center = np.argmax(salience, axis=1) # 帧长#index
390
+ salience = np.pad(salience, ((0, 0), (4, 4))) # 帧长,368
391
+ # t1 = ttime()
392
+ center += 4
393
+ todo_salience = []
394
+ todo_cents_mapping = []
395
+ starts = center - 4
396
+ ends = center + 5
397
+ for idx in range(salience.shape[0]):
398
+ todo_salience.append(salience[:, starts[idx] : ends[idx]][idx])
399
+ todo_cents_mapping.append(self.cents_mapping[starts[idx] : ends[idx]])
400
+ # t2 = ttime()
401
+ todo_salience = np.array(todo_salience) # 帧长,9
402
+ todo_cents_mapping = np.array(todo_cents_mapping) # 帧长,9
403
+ product_sum = np.sum(todo_salience * todo_cents_mapping, 1)
404
+ weight_sum = np.sum(todo_salience, 1) # 帧长
405
+ devided = product_sum / weight_sum # 帧长
406
+ # t3 = ttime()
407
+ maxx = np.max(salience, axis=1) # 帧长
408
+ devided[maxx <= thred] = 0
409
+ # t4 = ttime()
410
+ # print("decode:%s\t%s\t%s\t%s" % (t1 - t0, t2 - t1, t3 - t2, t4 - t3))
411
+ return devided
412
+
413
+
414
+ # if __name__ == '__main__':
415
+ # audio, sampling_rate = sf.read("卢本伟语录~1.wav")
416
+ # if len(audio.shape) > 1:
417
+ # audio = librosa.to_mono(audio.transpose(1, 0))
418
+ # audio_bak = audio.copy()
419
+ # if sampling_rate != 16000:
420
+ # audio = librosa.resample(audio, orig_sr=sampling_rate, target_sr=16000)
421
+ # model_path = "/bili-coeus/jupyter/jupyterhub-liujing04/vits_ch/test-RMVPE/weights/rmvpe_llc_half.pt"
422
+ # thred = 0.03 # 0.01
423
+ # device = 'cuda' if torch.cuda.is_available() else 'cpu'
424
+ # rmvpe = RMVPE(model_path,is_half=False, device=device)
425
+ # t0=ttime()
426
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
427
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
428
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
429
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
430
+ # f0 = rmvpe.infer_from_audio(audio, thred=thred)
431
+ # t1=ttime()
432
+ # print(f0.shape,t1-t0)
run.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Install Debian packages
2
+ sudo apt-get update
3
+ sudo apt-get install -qq -y build-essential ffmpeg aria2
4
+
5
+ # Upgrade pip and setuptools
6
+ pip install --upgrade pip
7
+ pip install --upgrade setuptools
8
+
9
+ # Install wheel package (built-package format for Python)
10
+ pip install wheel
11
+
12
+ # Install Python packages using pip
13
+ pip install -r requirements.txt
14
+
15
+ # Run application locally at http://127.0.0.1:7860
16
+ python app.py
utils.py ADDED
@@ -0,0 +1,151 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ffmpeg
2
+ import numpy as np
3
+
4
+ # import praatio
5
+ # import praatio.praat_scripts
6
+ import os
7
+ import sys
8
+
9
+ import random
10
+
11
+ import csv
12
+
13
+ platform_stft_mapping = {
14
+ "linux": "stftpitchshift",
15
+ "darwin": "stftpitchshift",
16
+ "win32": "stftpitchshift.exe",
17
+ }
18
+
19
+ stft = platform_stft_mapping.get(sys.platform)
20
+ # praatEXE = join('.',os.path.abspath(os.getcwd()) + r"\Praat.exe")
21
+
22
+
23
+ def CSVutil(file, rw, type, *args):
24
+ if type == "formanting":
25
+ if rw == "r":
26
+ with open(file) as fileCSVread:
27
+ csv_reader = list(csv.reader(fileCSVread))
28
+ return (
29
+ (csv_reader[0][0], csv_reader[0][1], csv_reader[0][2])
30
+ if csv_reader is not None
31
+ else (lambda: exec('raise ValueError("No data")'))()
32
+ )
33
+ else:
34
+ if args:
35
+ doformnt = args[0]
36
+ else:
37
+ doformnt = False
38
+ qfr = args[1] if len(args) > 1 else 1.0
39
+ tmb = args[2] if len(args) > 2 else 1.0
40
+ with open(file, rw, newline="") as fileCSVwrite:
41
+ csv_writer = csv.writer(fileCSVwrite, delimiter=",")
42
+ csv_writer.writerow([doformnt, qfr, tmb])
43
+ elif type == "stop":
44
+ stop = args[0] if args else False
45
+ with open(file, rw, newline="") as fileCSVwrite:
46
+ csv_writer = csv.writer(fileCSVwrite, delimiter=",")
47
+ csv_writer.writerow([stop])
48
+
49
+
50
+ def load_audio(file, sr, DoFormant, Quefrency, Timbre):
51
+ converted = False
52
+ DoFormant, Quefrency, Timbre = CSVutil("csvdb/formanting.csv", "r", "formanting")
53
+ try:
54
+ # https://github.com/openai/whisper/blob/main/whisper/audio.py#L26
55
+ # This launches a subprocess to decode audio while down-mixing and resampling as necessary.
56
+ # Requires the ffmpeg CLI and `ffmpeg-python` package to be installed.
57
+ file = (
58
+ file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
59
+ ) # 防止小白拷路径头尾带了空格和"和回车
60
+ file_formanted = file.strip(" ").strip('"').strip("\n").strip('"').strip(" ")
61
+
62
+ # print(f"dofor={bool(DoFormant)} timbr={Timbre} quef={Quefrency}\n")
63
+
64
+ if (
65
+ lambda DoFormant: True
66
+ if DoFormant.lower() == "true"
67
+ else (False if DoFormant.lower() == "false" else DoFormant)
68
+ )(DoFormant):
69
+ numerator = round(random.uniform(1, 4), 4)
70
+ # os.system(f"stftpitchshift -i {file} -q {Quefrency} -t {Timbre} -o {file_formanted}")
71
+ # print('stftpitchshift -i "%s" -p 1.0 --rms -w 128 -v 8 -q %s -t %s -o "%s"' % (file, Quefrency, Timbre, file_formanted))
72
+
73
+ if not file.endswith(".wav"):
74
+ if not os.path.isfile(f"{file_formanted}.wav"):
75
+ converted = True
76
+ # print(f"\nfile = {file}\n")
77
+ # print(f"\nfile_formanted = {file_formanted}\n")
78
+ converting = (
79
+ ffmpeg.input(file_formanted, threads=0)
80
+ .output(f"{file_formanted}.wav")
81
+ .run(
82
+ cmd=["ffmpeg", "-nostdin"],
83
+ capture_stdout=True,
84
+ capture_stderr=True,
85
+ )
86
+ )
87
+ else:
88
+ pass
89
+
90
+ file_formanted = (
91
+ f"{file_formanted}.wav"
92
+ if not file_formanted.endswith(".wav")
93
+ else file_formanted
94
+ )
95
+
96
+ print(f" · Formanting {file_formanted}...\n")
97
+
98
+ os.system(
99
+ '%s -i "%s" -q "%s" -t "%s" -o "%sFORMANTED_%s.wav"'
100
+ % (
101
+ stft,
102
+ file_formanted,
103
+ Quefrency,
104
+ Timbre,
105
+ file_formanted,
106
+ str(numerator),
107
+ )
108
+ )
109
+
110
+ print(f" · Formanted {file_formanted}!\n")
111
+
112
+ # filepraat = (os.path.abspath(os.getcwd()) + '\\' + file).replace('/','\\')
113
+ # file_formantedpraat = ('"' + os.path.abspath(os.getcwd()) + '/' + 'formanted'.join(file_formanted) + '"').replace('/','\\')
114
+ # print("%sFORMANTED_%s.wav" % (file_formanted, str(numerator)))
115
+
116
+ out, _ = (
117
+ ffmpeg.input(
118
+ "%sFORMANTED_%s.wav" % (file_formanted, str(numerator)), threads=0
119
+ )
120
+ .output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
121
+ .run(
122
+ cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
123
+ )
124
+ )
125
+
126
+ try:
127
+ os.remove("%sFORMANTED_%s.wav" % (file_formanted, str(numerator)))
128
+ except Exception:
129
+ pass
130
+ print("couldn't remove formanted type of file")
131
+
132
+ else:
133
+ out, _ = (
134
+ ffmpeg.input(file, threads=0)
135
+ .output("-", format="f32le", acodec="pcm_f32le", ac=1, ar=sr)
136
+ .run(
137
+ cmd=["ffmpeg", "-nostdin"], capture_stdout=True, capture_stderr=True
138
+ )
139
+ )
140
+ except Exception as e:
141
+ raise RuntimeError(f"Failed to load audio: {e}")
142
+
143
+ if converted:
144
+ try:
145
+ os.remove(file_formanted)
146
+ except Exception:
147
+ pass
148
+ print("couldn't remove converted type of file")
149
+ converted = False
150
+
151
+ return np.frombuffer(out, np.float32).flatten()
vc_infer_pipeline.py ADDED
@@ -0,0 +1,646 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np, parselmouth, torch, pdb, sys, os
2
+ from time import time as ttime
3
+ import torch.nn.functional as F
4
+ import torchcrepe # Fork feature. Use the crepe f0 algorithm. New dependency (pip install torchcrepe)
5
+ from torch import Tensor
6
+ import scipy.signal as signal
7
+ import pyworld, os, traceback, faiss, librosa, torchcrepe
8
+ from scipy import signal
9
+ from functools import lru_cache
10
+
11
+ now_dir = os.getcwd()
12
+ sys.path.append(now_dir)
13
+
14
+ bh, ah = signal.butter(N=5, Wn=48, btype="high", fs=16000)
15
+
16
+ input_audio_path2wav = {}
17
+
18
+
19
+ @lru_cache
20
+ def cache_harvest_f0(input_audio_path, fs, f0max, f0min, frame_period):
21
+ audio = input_audio_path2wav[input_audio_path]
22
+ f0, t = pyworld.harvest(
23
+ audio,
24
+ fs=fs,
25
+ f0_ceil=f0max,
26
+ f0_floor=f0min,
27
+ frame_period=frame_period,
28
+ )
29
+ f0 = pyworld.stonemask(audio, f0, t, fs)
30
+ return f0
31
+
32
+
33
+ def change_rms(data1, sr1, data2, sr2, rate): # 1是输入音频,2是输出音频,rate是2的占比
34
+ # print(data1.max(),data2.max())
35
+ rms1 = librosa.feature.rms(
36
+ y=data1, frame_length=sr1 // 2 * 2, hop_length=sr1 // 2
37
+ ) # 每半秒一个点
38
+ rms2 = librosa.feature.rms(y=data2, frame_length=sr2 // 2 * 2, hop_length=sr2 // 2)
39
+ rms1 = torch.from_numpy(rms1)
40
+ rms1 = F.interpolate(
41
+ rms1.unsqueeze(0), size=data2.shape[0], mode="linear"
42
+ ).squeeze()
43
+ rms2 = torch.from_numpy(rms2)
44
+ rms2 = F.interpolate(
45
+ rms2.unsqueeze(0), size=data2.shape[0], mode="linear"
46
+ ).squeeze()
47
+ rms2 = torch.max(rms2, torch.zeros_like(rms2) + 1e-6)
48
+ data2 *= (
49
+ torch.pow(rms1, torch.tensor(1 - rate))
50
+ * torch.pow(rms2, torch.tensor(rate - 1))
51
+ ).numpy()
52
+ return data2
53
+
54
+
55
+ class VC(object):
56
+ def __init__(self, tgt_sr, config):
57
+ self.x_pad, self.x_query, self.x_center, self.x_max, self.is_half = (
58
+ config.x_pad,
59
+ config.x_query,
60
+ config.x_center,
61
+ config.x_max,
62
+ config.is_half,
63
+ )
64
+ self.sr = 16000 # hubert输入采样率
65
+ self.window = 160 # 每帧点数
66
+ self.t_pad = self.sr * self.x_pad # 每条前后pad时间
67
+ self.t_pad_tgt = tgt_sr * self.x_pad
68
+ self.t_pad2 = self.t_pad * 2
69
+ self.t_query = self.sr * self.x_query # 查询切点前后查询时间
70
+ self.t_center = self.sr * self.x_center # 查询切点位置
71
+ self.t_max = self.sr * self.x_max # 免查询时长阈值
72
+ self.device = config.device
73
+
74
+ # Fork Feature: Get the best torch device to use for f0 algorithms that require a torch device. Will return the type (torch.device)
75
+ def get_optimal_torch_device(self, index: int = 0) -> torch.device:
76
+ # Get cuda device
77
+ if torch.cuda.is_available():
78
+ return torch.device(
79
+ f"cuda:{index % torch.cuda.device_count()}"
80
+ ) # Very fast
81
+ elif torch.backends.mps.is_available():
82
+ return torch.device("mps")
83
+ # Insert an else here to grab "xla" devices if available. TO DO later. Requires the torch_xla.core.xla_model library
84
+ # Else wise return the "cpu" as a torch device,
85
+ return torch.device("cpu")
86
+
87
+ # Fork Feature: Compute f0 with the crepe method
88
+ def get_f0_crepe_computation(
89
+ self,
90
+ x,
91
+ f0_min,
92
+ f0_max,
93
+ p_len,
94
+ hop_length=160, # 512 before. Hop length changes the speed that the voice jumps to a different dramatic pitch. Lower hop lengths means more pitch accuracy but longer inference time.
95
+ model="full", # Either use crepe-tiny "tiny" or crepe "full". Default is full
96
+ ):
97
+ x = x.astype(
98
+ np.float32
99
+ ) # fixes the F.conv2D exception. We needed to convert double to float.
100
+ x /= np.quantile(np.abs(x), 0.999)
101
+ torch_device = self.get_optimal_torch_device()
102
+ audio = torch.from_numpy(x).to(torch_device, copy=True)
103
+ audio = torch.unsqueeze(audio, dim=0)
104
+ if audio.ndim == 2 and audio.shape[0] > 1:
105
+ audio = torch.mean(audio, dim=0, keepdim=True).detach()
106
+ audio = audio.detach()
107
+ print("Initiating prediction with a crepe_hop_length of: " + str(hop_length))
108
+ pitch: Tensor = torchcrepe.predict(
109
+ audio,
110
+ self.sr,
111
+ hop_length,
112
+ f0_min,
113
+ f0_max,
114
+ model,
115
+ batch_size=hop_length * 2,
116
+ device=torch_device,
117
+ pad=True,
118
+ )
119
+ p_len = p_len or x.shape[0] // hop_length
120
+ # Resize the pitch for final f0
121
+ source = np.array(pitch.squeeze(0).cpu().float().numpy())
122
+ source[source < 0.001] = np.nan
123
+ target = np.interp(
124
+ np.arange(0, len(source) * p_len, len(source)) / p_len,
125
+ np.arange(0, len(source)),
126
+ source,
127
+ )
128
+ f0 = np.nan_to_num(target)
129
+ return f0 # Resized f0
130
+
131
+ def get_f0_official_crepe_computation(
132
+ self,
133
+ x,
134
+ f0_min,
135
+ f0_max,
136
+ model="full",
137
+ ):
138
+ # Pick a batch size that doesn't cause memory errors on your gpu
139
+ batch_size = 512
140
+ # Compute pitch using first gpu
141
+ audio = torch.tensor(np.copy(x))[None].float()
142
+ f0, pd = torchcrepe.predict(
143
+ audio,
144
+ self.sr,
145
+ self.window,
146
+ f0_min,
147
+ f0_max,
148
+ model,
149
+ batch_size=batch_size,
150
+ device=self.device,
151
+ return_periodicity=True,
152
+ )
153
+ pd = torchcrepe.filter.median(pd, 3)
154
+ f0 = torchcrepe.filter.mean(f0, 3)
155
+ f0[pd < 0.1] = 0
156
+ f0 = f0[0].cpu().numpy()
157
+ return f0
158
+
159
+ # Fork Feature: Compute pYIN f0 method
160
+ def get_f0_pyin_computation(self, x, f0_min, f0_max):
161
+ y, sr = librosa.load("saudio/Sidney.wav", self.sr, mono=True)
162
+ f0, _, _ = librosa.pyin(y, sr=self.sr, fmin=f0_min, fmax=f0_max)
163
+ f0 = f0[1:] # Get rid of extra first frame
164
+ return f0
165
+
166
+ # Fork Feature: Acquire median hybrid f0 estimation calculation
167
+ def get_f0_hybrid_computation(
168
+ self,
169
+ methods_str,
170
+ input_audio_path,
171
+ x,
172
+ f0_min,
173
+ f0_max,
174
+ p_len,
175
+ filter_radius,
176
+ crepe_hop_length,
177
+ time_step,
178
+ ):
179
+ # Get various f0 methods from input to use in the computation stack
180
+ s = methods_str
181
+ s = s.split("hybrid")[1]
182
+ s = s.replace("[", "").replace("]", "")
183
+ methods = s.split("+")
184
+ f0_computation_stack = []
185
+
186
+ print("Calculating f0 pitch estimations for methods: %s" % str(methods))
187
+ x = x.astype(np.float32)
188
+ x /= np.quantile(np.abs(x), 0.999)
189
+ # Get f0 calculations for all methods specified
190
+ for method in methods:
191
+ f0 = None
192
+ if method == "pm":
193
+ f0 = (
194
+ parselmouth.Sound(x, self.sr)
195
+ .to_pitch_ac(
196
+ time_step=time_step / 1000,
197
+ voicing_threshold=0.6,
198
+ pitch_floor=f0_min,
199
+ pitch_ceiling=f0_max,
200
+ )
201
+ .selected_array["frequency"]
202
+ )
203
+ pad_size = (p_len - len(f0) + 1) // 2
204
+ if pad_size > 0 or p_len - len(f0) - pad_size > 0:
205
+ f0 = np.pad(
206
+ f0, [[pad_size, p_len - len(f0) - pad_size]], mode="constant"
207
+ )
208
+ elif method == "crepe":
209
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max)
210
+ f0 = f0[1:] # Get rid of extra first frame
211
+ elif method == "crepe-tiny":
212
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max, "tiny")
213
+ f0 = f0[1:] # Get rid of extra first frame
214
+ elif method == "mangio-crepe":
215
+ f0 = self.get_f0_crepe_computation(
216
+ x, f0_min, f0_max, p_len, crepe_hop_length
217
+ )
218
+ elif method == "mangio-crepe-tiny":
219
+ f0 = self.get_f0_crepe_computation(
220
+ x, f0_min, f0_max, p_len, crepe_hop_length, "tiny"
221
+ )
222
+ elif method == "harvest":
223
+ f0 = cache_harvest_f0(input_audio_path, self.sr, f0_max, f0_min, 10)
224
+ if filter_radius > 2:
225
+ f0 = signal.medfilt(f0, 3)
226
+ f0 = f0[1:] # Get rid of first frame.
227
+ elif method == "dio": # Potentially buggy?
228
+ f0, t = pyworld.dio(
229
+ x.astype(np.double),
230
+ fs=self.sr,
231
+ f0_ceil=f0_max,
232
+ f0_floor=f0_min,
233
+ frame_period=10,
234
+ )
235
+ f0 = pyworld.stonemask(x.astype(np.double), f0, t, self.sr)
236
+ f0 = signal.medfilt(f0, 3)
237
+ f0 = f0[1:]
238
+ # elif method == "pyin": Not Working just yet
239
+ # f0 = self.get_f0_pyin_computation(x, f0_min, f0_max)
240
+ # Push method to the stack
241
+ f0_computation_stack.append(f0)
242
+
243
+ for fc in f0_computation_stack:
244
+ print(len(fc))
245
+
246
+ print("Calculating hybrid median f0 from the stack of: %s" % str(methods))
247
+ f0_median_hybrid = None
248
+ if len(f0_computation_stack) == 1:
249
+ f0_median_hybrid = f0_computation_stack[0]
250
+ else:
251
+ f0_median_hybrid = np.nanmedian(f0_computation_stack, axis=0)
252
+ return f0_median_hybrid
253
+
254
+ def get_f0(
255
+ self,
256
+ input_audio_path,
257
+ x,
258
+ p_len,
259
+ f0_up_key,
260
+ f0_method,
261
+ filter_radius,
262
+ crepe_hop_length,
263
+ inp_f0=None,
264
+ ):
265
+ global input_audio_path2wav
266
+ time_step = self.window / self.sr * 1000
267
+ f0_min = 50
268
+ f0_max = 1100
269
+ f0_mel_min = 1127 * np.log(1 + f0_min / 700)
270
+ f0_mel_max = 1127 * np.log(1 + f0_max / 700)
271
+ if f0_method == "pm":
272
+ f0 = (
273
+ parselmouth.Sound(x, self.sr)
274
+ .to_pitch_ac(
275
+ time_step=time_step / 1000,
276
+ voicing_threshold=0.6,
277
+ pitch_floor=f0_min,
278
+ pitch_ceiling=f0_max,
279
+ )
280
+ .selected_array["frequency"]
281
+ )
282
+ pad_size = (p_len - len(f0) + 1) // 2
283
+ if pad_size > 0 or p_len - len(f0) - pad_size > 0:
284
+ f0 = np.pad(
285
+ f0, [[pad_size, p_len - len(f0) - pad_size]], mode="constant"
286
+ )
287
+ elif f0_method == "harvest":
288
+ input_audio_path2wav[input_audio_path] = x.astype(np.double)
289
+ f0 = cache_harvest_f0(input_audio_path, self.sr, f0_max, f0_min, 10)
290
+ if filter_radius > 2:
291
+ f0 = signal.medfilt(f0, 3)
292
+ elif f0_method == "dio": # Potentially Buggy?
293
+ f0, t = pyworld.dio(
294
+ x.astype(np.double),
295
+ fs=self.sr,
296
+ f0_ceil=f0_max,
297
+ f0_floor=f0_min,
298
+ frame_period=10,
299
+ )
300
+ f0 = pyworld.stonemask(x.astype(np.double), f0, t, self.sr)
301
+ f0 = signal.medfilt(f0, 3)
302
+ elif f0_method == "crepe":
303
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max)
304
+ elif f0_method == "crepe-tiny":
305
+ f0 = self.get_f0_official_crepe_computation(x, f0_min, f0_max, "tiny")
306
+ elif f0_method == "mangio-crepe":
307
+ f0 = self.get_f0_crepe_computation(
308
+ x, f0_min, f0_max, p_len, crepe_hop_length
309
+ )
310
+ elif f0_method == "mangio-crepe-tiny":
311
+ f0 = self.get_f0_crepe_computation(
312
+ x, f0_min, f0_max, p_len, crepe_hop_length, "tiny"
313
+ )
314
+ elif f0_method == "rmvpe":
315
+ if hasattr(self, "model_rmvpe") == False:
316
+ from rmvpe import RMVPE
317
+
318
+ print("loading rmvpe model")
319
+ self.model_rmvpe = RMVPE(
320
+ "rmvpe.pt", is_half=self.is_half, device=self.device
321
+ )
322
+ f0 = self.model_rmvpe.infer_from_audio(x, thred=0.03)
323
+
324
+ elif "hybrid" in f0_method:
325
+ # Perform hybrid median pitch estimation
326
+ input_audio_path2wav[input_audio_path] = x.astype(np.double)
327
+ f0 = self.get_f0_hybrid_computation(
328
+ f0_method,
329
+ input_audio_path,
330
+ x,
331
+ f0_min,
332
+ f0_max,
333
+ p_len,
334
+ filter_radius,
335
+ crepe_hop_length,
336
+ time_step,
337
+ )
338
+
339
+ f0 *= pow(2, f0_up_key / 12)
340
+ # with open("test.txt","w")as f:f.write("\n".join([str(i)for i in f0.tolist()]))
341
+ tf0 = self.sr // self.window # 每秒f0点数
342
+ if inp_f0 is not None:
343
+ delta_t = np.round(
344
+ (inp_f0[:, 0].max() - inp_f0[:, 0].min()) * tf0 + 1
345
+ ).astype("int16")
346
+ replace_f0 = np.interp(
347
+ list(range(delta_t)), inp_f0[:, 0] * 100, inp_f0[:, 1]
348
+ )
349
+ shape = f0[self.x_pad * tf0 : self.x_pad * tf0 + len(replace_f0)].shape[0]
350
+ f0[self.x_pad * tf0 : self.x_pad * tf0 + len(replace_f0)] = replace_f0[
351
+ :shape
352
+ ]
353
+ # with open("test_opt.txt","w")as f:f.write("\n".join([str(i)for i in f0.tolist()]))
354
+ f0bak = f0.copy()
355
+ f0_mel = 1127 * np.log(1 + f0 / 700)
356
+ f0_mel[f0_mel > 0] = (f0_mel[f0_mel > 0] - f0_mel_min) * 254 / (
357
+ f0_mel_max - f0_mel_min
358
+ ) + 1
359
+ f0_mel[f0_mel <= 1] = 1
360
+ f0_mel[f0_mel > 255] = 255
361
+ f0_coarse = np.rint(f0_mel).astype(np.int)
362
+
363
+ return f0_coarse, f0bak # 1-0
364
+
365
+ def vc(
366
+ self,
367
+ model,
368
+ net_g,
369
+ sid,
370
+ audio0,
371
+ pitch,
372
+ pitchf,
373
+ times,
374
+ index,
375
+ big_npy,
376
+ index_rate,
377
+ version,
378
+ protect,
379
+ ): # ,file_index,file_big_npy
380
+ feats = torch.from_numpy(audio0)
381
+ if self.is_half:
382
+ feats = feats.half()
383
+ else:
384
+ feats = feats.float()
385
+ if feats.dim() == 2: # double channels
386
+ feats = feats.mean(-1)
387
+ assert feats.dim() == 1, feats.dim()
388
+ feats = feats.view(1, -1)
389
+ padding_mask = torch.BoolTensor(feats.shape).to(self.device).fill_(False)
390
+
391
+ inputs = {
392
+ "source": feats.to(self.device),
393
+ "padding_mask": padding_mask,
394
+ "output_layer": 9 if version == "v1" else 12,
395
+ }
396
+ t0 = ttime()
397
+ with torch.no_grad():
398
+ logits = model.extract_features(**inputs)
399
+ feats = model.final_proj(logits[0]) if version == "v1" else logits[0]
400
+ if protect < 0.5 and pitch != None and pitchf != None:
401
+ feats0 = feats.clone()
402
+ if (
403
+ isinstance(index, type(None)) == False
404
+ and isinstance(big_npy, type(None)) == False
405
+ and index_rate != 0
406
+ ):
407
+ npy = feats[0].cpu().numpy()
408
+ if self.is_half:
409
+ npy = npy.astype("float32")
410
+
411
+ # _, I = index.search(npy, 1)
412
+ # npy = big_npy[I.squeeze()]
413
+
414
+ score, ix = index.search(npy, k=8)
415
+ weight = np.square(1 / score)
416
+ weight /= weight.sum(axis=1, keepdims=True)
417
+ npy = np.sum(big_npy[ix] * np.expand_dims(weight, axis=2), axis=1)
418
+
419
+ if self.is_half:
420
+ npy = npy.astype("float16")
421
+ feats = (
422
+ torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate
423
+ + (1 - index_rate) * feats
424
+ )
425
+
426
+ feats = F.interpolate(feats.permute(0, 2, 1), scale_factor=2).permute(0, 2, 1)
427
+ if protect < 0.5 and pitch != None and pitchf != None:
428
+ feats0 = F.interpolate(feats0.permute(0, 2, 1), scale_factor=2).permute(
429
+ 0, 2, 1
430
+ )
431
+ t1 = ttime()
432
+ p_len = audio0.shape[0] // self.window
433
+ if feats.shape[1] < p_len:
434
+ p_len = feats.shape[1]
435
+ if pitch != None and pitchf != None:
436
+ pitch = pitch[:, :p_len]
437
+ pitchf = pitchf[:, :p_len]
438
+
439
+ if protect < 0.5 and pitch != None and pitchf != None:
440
+ pitchff = pitchf.clone()
441
+ pitchff[pitchf > 0] = 1
442
+ pitchff[pitchf < 1] = protect
443
+ pitchff = pitchff.unsqueeze(-1)
444
+ feats = feats * pitchff + feats0 * (1 - pitchff)
445
+ feats = feats.to(feats0.dtype)
446
+ p_len = torch.tensor([p_len], device=self.device).long()
447
+ with torch.no_grad():
448
+ if pitch != None and pitchf != None:
449
+ audio1 = (
450
+ (net_g.infer(feats, p_len, pitch, pitchf, sid)[0][0, 0])
451
+ .data.cpu()
452
+ .float()
453
+ .numpy()
454
+ )
455
+ else:
456
+ audio1 = (
457
+ (net_g.infer(feats, p_len, sid)[0][0, 0]).data.cpu().float().numpy()
458
+ )
459
+ del feats, p_len, padding_mask
460
+ if torch.cuda.is_available():
461
+ torch.cuda.empty_cache()
462
+ t2 = ttime()
463
+ times[0] += t1 - t0
464
+ times[2] += t2 - t1
465
+ return audio1
466
+
467
+ def pipeline(
468
+ self,
469
+ model,
470
+ net_g,
471
+ sid,
472
+ audio,
473
+ input_audio_path,
474
+ times,
475
+ f0_up_key,
476
+ f0_method,
477
+ file_index,
478
+ # file_big_npy,
479
+ index_rate,
480
+ if_f0,
481
+ filter_radius,
482
+ tgt_sr,
483
+ resample_sr,
484
+ rms_mix_rate,
485
+ version,
486
+ protect,
487
+ crepe_hop_length,
488
+ f0_file=None,
489
+ ):
490
+ if (
491
+ file_index != ""
492
+ # and file_big_npy != ""
493
+ # and os.path.exists(file_big_npy) == True
494
+ and os.path.exists(file_index) == True
495
+ and index_rate != 0
496
+ ):
497
+ try:
498
+ index = faiss.read_index(file_index)
499
+ # big_npy = np.load(file_big_npy)
500
+ big_npy = index.reconstruct_n(0, index.ntotal)
501
+ except:
502
+ traceback.print_exc()
503
+ index = big_npy = None
504
+ else:
505
+ index = big_npy = None
506
+ audio = signal.filtfilt(bh, ah, audio)
507
+ audio_pad = np.pad(audio, (self.window // 2, self.window // 2), mode="reflect")
508
+ opt_ts = []
509
+ if audio_pad.shape[0] > self.t_max:
510
+ audio_sum = np.zeros_like(audio)
511
+ for i in range(self.window):
512
+ audio_sum += audio_pad[i : i - self.window]
513
+ for t in range(self.t_center, audio.shape[0], self.t_center):
514
+ opt_ts.append(
515
+ t
516
+ - self.t_query
517
+ + np.where(
518
+ np.abs(audio_sum[t - self.t_query : t + self.t_query])
519
+ == np.abs(audio_sum[t - self.t_query : t + self.t_query]).min()
520
+ )[0][0]
521
+ )
522
+ s = 0
523
+ audio_opt = []
524
+ t = None
525
+ t1 = ttime()
526
+ audio_pad = np.pad(audio, (self.t_pad, self.t_pad), mode="reflect")
527
+ p_len = audio_pad.shape[0] // self.window
528
+ inp_f0 = None
529
+ if hasattr(f0_file, "name") == True:
530
+ try:
531
+ with open(f0_file.name, "r") as f:
532
+ lines = f.read().strip("\n").split("\n")
533
+ inp_f0 = []
534
+ for line in lines:
535
+ inp_f0.append([float(i) for i in line.split(",")])
536
+ inp_f0 = np.array(inp_f0, dtype="float32")
537
+ except:
538
+ traceback.print_exc()
539
+ sid = torch.tensor(sid, device=self.device).unsqueeze(0).long()
540
+ pitch, pitchf = None, None
541
+ if if_f0 == 1:
542
+ pitch, pitchf = self.get_f0(
543
+ input_audio_path,
544
+ audio_pad,
545
+ p_len,
546
+ f0_up_key,
547
+ f0_method,
548
+ filter_radius,
549
+ crepe_hop_length,
550
+ inp_f0,
551
+ )
552
+ pitch = pitch[:p_len]
553
+ pitchf = pitchf[:p_len]
554
+ if self.device == "mps":
555
+ pitchf = pitchf.astype(np.float32)
556
+ pitch = torch.tensor(pitch, device=self.device).unsqueeze(0).long()
557
+ pitchf = torch.tensor(pitchf, device=self.device).unsqueeze(0).float()
558
+ t2 = ttime()
559
+ times[1] += t2 - t1
560
+ for t in opt_ts:
561
+ t = t // self.window * self.window
562
+ if if_f0 == 1:
563
+ audio_opt.append(
564
+ self.vc(
565
+ model,
566
+ net_g,
567
+ sid,
568
+ audio_pad[s : t + self.t_pad2 + self.window],
569
+ pitch[:, s // self.window : (t + self.t_pad2) // self.window],
570
+ pitchf[:, s // self.window : (t + self.t_pad2) // self.window],
571
+ times,
572
+ index,
573
+ big_npy,
574
+ index_rate,
575
+ version,
576
+ protect,
577
+ )[self.t_pad_tgt : -self.t_pad_tgt]
578
+ )
579
+ else:
580
+ audio_opt.append(
581
+ self.vc(
582
+ model,
583
+ net_g,
584
+ sid,
585
+ audio_pad[s : t + self.t_pad2 + self.window],
586
+ None,
587
+ None,
588
+ times,
589
+ index,
590
+ big_npy,
591
+ index_rate,
592
+ version,
593
+ protect,
594
+ )[self.t_pad_tgt : -self.t_pad_tgt]
595
+ )
596
+ s = t
597
+ if if_f0 == 1:
598
+ audio_opt.append(
599
+ self.vc(
600
+ model,
601
+ net_g,
602
+ sid,
603
+ audio_pad[t:],
604
+ pitch[:, t // self.window :] if t is not None else pitch,
605
+ pitchf[:, t // self.window :] if t is not None else pitchf,
606
+ times,
607
+ index,
608
+ big_npy,
609
+ index_rate,
610
+ version,
611
+ protect,
612
+ )[self.t_pad_tgt : -self.t_pad_tgt]
613
+ )
614
+ else:
615
+ audio_opt.append(
616
+ self.vc(
617
+ model,
618
+ net_g,
619
+ sid,
620
+ audio_pad[t:],
621
+ None,
622
+ None,
623
+ times,
624
+ index,
625
+ big_npy,
626
+ index_rate,
627
+ version,
628
+ protect,
629
+ )[self.t_pad_tgt : -self.t_pad_tgt]
630
+ )
631
+ audio_opt = np.concatenate(audio_opt)
632
+ if rms_mix_rate != 1:
633
+ audio_opt = change_rms(audio, 16000, audio_opt, tgt_sr, rms_mix_rate)
634
+ if resample_sr >= 16000 and tgt_sr != resample_sr:
635
+ audio_opt = librosa.resample(
636
+ audio_opt, orig_sr=tgt_sr, target_sr=resample_sr
637
+ )
638
+ audio_max = np.abs(audio_opt).max() / 0.99
639
+ max_int16 = 32768
640
+ if audio_max > 1:
641
+ max_int16 /= audio_max
642
+ audio_opt = (audio_opt * max_int16).astype(np.int16)
643
+ del pitch, pitchf, sid
644
+ if torch.cuda.is_available():
645
+ torch.cuda.empty_cache()
646
+ return audio_opt