Gerard Muniesa
commited on
Commit
•
a5fbdd4
1
Parent(s):
640e286
[NEW] Add model Card, model files and data preprocessing files
Browse files- README.md +110 -3
- data_processing/README.md +40 -0
- data_processing/ca_multi2vckt.py +152 -0
- data_processing/extract_festcat.py +139 -0
- data_processing/extract_google_tts.py +168 -0
- data_processing/festcat_processing_test.sh +152 -0
- data_processing/google_tts_processing_test.sh +124 -0
- data_processing/process_data.sh +56 -0
- model/best_model.pth +3 -0
- model/config.json +262 -0
- model/speakers.pth +3 -0
README.md
CHANGED
@@ -1,3 +1,110 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Aina Project's Catalan multi-speaker text-to-speech model
|
2 |
+
## Model description
|
3 |
+
|
4 |
+
This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), [OpenSLR](http://openslr.org/69/) and [Common Voice](https://commonvoice.mozilla.org/ca). For the training, 101460 utterances consisting of 257 speakers were used, which corresponds to nearly 138 hours of speech. [Here](https://huggingface.co/spaces/projecte-aina/VITS_ca_multispeaker) you can find a demo of the model.
|
5 |
+
|
6 |
+
## Intended uses and limitations
|
7 |
+
|
8 |
+
You can use this model to generate synthetic speech in Catalan with different voices.
|
9 |
+
|
10 |
+
## How to use
|
11 |
+
### Usage
|
12 |
+
|
13 |
+
Requiered libraries:
|
14 |
+
|
15 |
+
```bash
|
16 |
+
pip install git+https://github.com/coqui-ai/TTS@dev#egg=TTS
|
17 |
+
```
|
18 |
+
|
19 |
+
Synthesize a speech using python:
|
20 |
+
|
21 |
+
```bash
|
22 |
+
import tempfile
|
23 |
+
import gradio as gr
|
24 |
+
import numpy as np
|
25 |
+
import os
|
26 |
+
import json
|
27 |
+
|
28 |
+
from typing import Optional
|
29 |
+
from TTS.config import load_config
|
30 |
+
from TTS.utils.manage import ModelManager
|
31 |
+
from TTS.utils.synthesizer import Synthesizer
|
32 |
+
|
33 |
+
model_path = # Absolute path to the model checkpoint.pth
|
34 |
+
config_path = # Absolute path to the model config.json
|
35 |
+
speakers_file_path = # Absolute path to speakers.pth file
|
36 |
+
|
37 |
+
text = "Text to synthetize"
|
38 |
+
speaker_idx = "Speaker ID"
|
39 |
+
|
40 |
+
synthesizer = Synthesizer(
|
41 |
+
model_path, config_path, speakers_file_path, None, None, None,
|
42 |
+
)
|
43 |
+
wavs = synthesizer.tts(text, speaker_idx)
|
44 |
+
```
|
45 |
+
|
46 |
+
|
47 |
+
## Training
|
48 |
+
### Training Procedure
|
49 |
+
### Data preparation
|
50 |
+
The data has been processed using the script process_data.py, which reduces the sampling frequency of the audios, eliminates silences, adds padding and structures the data in the format accepted by the framework. You can find more information here.
|
51 |
+
|
52 |
+
### Hyperparameter
|
53 |
+
|
54 |
+
The model is based on VITS proposed by [Kim et al](https://arxiv.org/abs/2106.06103). The following hyperparameters were set in the coqui framework.
|
55 |
+
|
56 |
+
| Hyperparameter | Value |
|
57 |
+
|------------------------------------|----------------------------------|
|
58 |
+
| Model | vits |
|
59 |
+
| Batch Size | 16 |
|
60 |
+
| Eval Batch Size | 8 |
|
61 |
+
| Mixed Precision | false |
|
62 |
+
| Window Length | 1024 |
|
63 |
+
| Hop Length | 256 |
|
64 |
+
| FTT size | 1024 |
|
65 |
+
| Num Mels | 80 |
|
66 |
+
| Phonemizer | espeak |
|
67 |
+
| Phoneme Lenguage | ca |
|
68 |
+
| Text Cleaners | multilingual_cleaners |
|
69 |
+
| Formatter | vctk_old |
|
70 |
+
| Optimizer | adam |
|
71 |
+
| Adam betas | (0.8, 0.99) |
|
72 |
+
| Adam eps | 1e-09 |
|
73 |
+
| Adam weight decay | 0.01 |
|
74 |
+
| Learning Rate Gen | 0.0001 |
|
75 |
+
| Lr. schedurer Gen | ExponentialLR |
|
76 |
+
| Lr. schedurer Gamma Gen | 0.999875 |
|
77 |
+
| Learning Rate Disc | 0.0001 |
|
78 |
+
| Lr. schedurer Disc | ExponentialLR |
|
79 |
+
| Lr. schedurer Gamma Disc | 0.999875 |
|
80 |
+
|
81 |
+
The model was trained for 730962 steps.
|
82 |
+
|
83 |
+
## Additional information
|
84 |
+
|
85 |
+
### Author
|
86 |
+
Text Mining Unit (TeMU) at the Barcelona Supercomputing Center ([email protected])
|
87 |
+
|
88 |
+
### Contact information
|
89 |
+
For further information, send an email to [email protected]
|
90 |
+
|
91 |
+
### Copyright
|
92 |
+
Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
|
93 |
+
|
94 |
+
|
95 |
+
### Licensing Information
|
96 |
+
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
|
97 |
+
|
98 |
+
### Funding
|
99 |
+
This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
|
100 |
+
|
101 |
+
|
102 |
+
## Disclaimer
|
103 |
+
<details>
|
104 |
+
<summary>Click to expand</summary>
|
105 |
+
|
106 |
+
The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.
|
107 |
+
|
108 |
+
When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
|
109 |
+
|
110 |
+
In no event shall the owner and creator of the models (BSC – Barcelona Supercomputing Center) be liable for any results arising from the use made by third parties of these models.
|
data_processing/README.md
ADDED
@@ -0,0 +1,40 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Data preparation
|
2 |
+
|
3 |
+
Scripts to process [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/) datasets, to make them compatible with training of modern TTS architectures
|
4 |
+
|
5 |
+
## Requirements
|
6 |
+
`sox`, `ffmpeg`
|
7 |
+
|
8 |
+
### Processing steps
|
9 |
+
|
10 |
+
#### Downloads
|
11 |
+
Download [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/)
|
12 |
+
|
13 |
+
#### Variables definition
|
14 |
+
|
15 |
+
Open the shell script `.../data_processing/process_data.sh` and modify the following fields:
|
16 |
+
|
17 |
+
```bash
|
18 |
+
### Festcat variables ###
|
19 |
+
export PATH_TO_FESTCAT_SHELL='.../data_processing/festcat_processing_test.sh' # Absolute path to festcat_processing_test.sh script
|
20 |
+
export PATH_TO_FESTCAT_PY='.../data_processing/extract_festcat.py' # Absolute path to extract_festcat.py script
|
21 |
+
export PATH_TO_FESTCAT_DATA='.../festcat/' # Path to Festcat dataset
|
22 |
+
export FESTCAT_FINAL_PATH='.../festcat_processed' # Path where preprocessed Festcat will be stored
|
23 |
+
|
24 |
+
### Google_tts variables ###
|
25 |
+
export PATH_TO_GOOGLE_TTS_SHELL='.../data_processing/google_tts_processing_test.sh' # Absolute path to google_tts_processing_test.sh script
|
26 |
+
export PATH_TO_GOOGLE_TTS_PY='.../data_processing/extract_google_tts.py' # Absolute path to extract_google_tts.py script
|
27 |
+
export PATH_TO_GOOGLE_TTS_DATA='.../google_tts' # Path to Google TTS dataset
|
28 |
+
export GOOGLE_TTS_FINAL_PATH='.../google_tts_processed' # Path where preprocessed Google TTS will be stored
|
29 |
+
|
30 |
+
### General variables ###
|
31 |
+
export VCTK_FORMATER_PATH='.../data_processing/ca_multi2vckt.py' # Absolute path to ca_multi2vckt.py script
|
32 |
+
export FINAL_PATH='.../multispeaker_ca_test/' # Path where preprocessed and vctk formatted datasets will be stored.
|
33 |
+
```
|
34 |
+
#### Run preprocessing
|
35 |
+
|
36 |
+
Once the variables are correctly defined, execute the following command in the terminal:
|
37 |
+
|
38 |
+
`sh <...>/data_processing/process_data.sh`
|
39 |
+
|
40 |
+
The processed data in vctk format will be in the directory defined in `export FINAL_PATH='.../multispeaker_ca_test/'`.
|
data_processing/ca_multi2vckt.py
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import re
|
3 |
+
import argparse
|
4 |
+
from glob import glob
|
5 |
+
from pathlib import Path
|
6 |
+
from subprocess import call
|
7 |
+
|
8 |
+
def main():
|
9 |
+
my_parser = argparse.ArgumentParser()
|
10 |
+
my_parser.add_argument('--google-path',
|
11 |
+
metavar='path',
|
12 |
+
type=str,
|
13 |
+
help='the path to tsv file')
|
14 |
+
my_parser.add_argument('--festcat-path',
|
15 |
+
metavar='path',
|
16 |
+
type=str,
|
17 |
+
help='the path to wavs file')
|
18 |
+
#my_parser.add_argument('--cv-path',
|
19 |
+
# metavar='path',
|
20 |
+
# type=str,
|
21 |
+
# help='the path to wavs file')
|
22 |
+
my_parser.add_argument('--final-path',
|
23 |
+
metavar='path',
|
24 |
+
type=str,
|
25 |
+
help='the path to wavs file')
|
26 |
+
args = my_parser.parse_args()
|
27 |
+
google_path = args.google_path
|
28 |
+
festcat_path = args.festcat_path
|
29 |
+
#common_voice_path = args.cv_path
|
30 |
+
target_base_path = args.final_path
|
31 |
+
|
32 |
+
google_tts_male = google_path + "/male/"
|
33 |
+
google_tts_female = google_path + "/female/"
|
34 |
+
google_tts_paths = [google_tts_male, google_tts_female]
|
35 |
+
|
36 |
+
#google_tts_paths = ["/gpfs/scratch/bsc88/bsc88858/google_tts/male/","/gpfs/scratch/bsc88/bsc88858/google_tts/female/"]
|
37 |
+
#festcat_path = "/gpfs/scratch/bsc88/bsc88858/festcat/"
|
38 |
+
#common_voice_path = "/gpfs/scratch/bsc88/bsc88858/cv-corpus-9.0-2022-04-27/ca/"
|
39 |
+
#target_base_path = "/gpfs/scratch/bsc88/bsc88474/data/multispeaker_ca/"
|
40 |
+
|
41 |
+
if os.path.exists(google_path):
|
42 |
+
print("Converting google_tts data to vctk format")
|
43 |
+
convert_google(google_tts_paths, target_base_path)
|
44 |
+
else:
|
45 |
+
print("Google_tts processed data not found")
|
46 |
+
|
47 |
+
if os.path.exists(festcat_path):
|
48 |
+
print("Converting festcat data to vctk format")
|
49 |
+
convert_festcat(festcat_path, target_base_path)
|
50 |
+
else:
|
51 |
+
print("Festcat processed data not found")
|
52 |
+
|
53 |
+
#convert_cv(common_voice_path, target_base_path)
|
54 |
+
|
55 |
+
def convert_google(google_tts_paths, target_base_path):
|
56 |
+
for g_path in google_tts_paths[:1]:
|
57 |
+
meta_files = glob(f"{g_path}/*_*.txt")
|
58 |
+
for meta_file in meta_files:
|
59 |
+
print(meta_file)
|
60 |
+
for line in open(meta_file).readlines():
|
61 |
+
text_id, text = line.strip().split('|')
|
62 |
+
text.replace('¿','')
|
63 |
+
text.replace('¡','')
|
64 |
+
#speaker_id = '_'.join(text_id.split('_')[:2])
|
65 |
+
speaker_id = text_id.split('_')[1]
|
66 |
+
target_text_file = os.path.join(target_base_path, 'txt',
|
67 |
+
speaker_id, text_id+'.txt')
|
68 |
+
target_wav_file = os.path.join(target_base_path, 'wav',
|
69 |
+
speaker_id, text_id+'.wav')
|
70 |
+
source_wav_file = os.path.join(g_path, 'wavs', text_id+'.wav')
|
71 |
+
|
72 |
+
speaker_paths = [os.path.dirname(target_text_file),
|
73 |
+
os.path.dirname(target_wav_file)]
|
74 |
+
|
75 |
+
convert_meta(target_text_file, target_wav_file,
|
76 |
+
source_wav_file, speaker_paths, text)
|
77 |
+
|
78 |
+
def convert_meta(target_text_file,
|
79 |
+
target_wav_file,
|
80 |
+
source_wav_file,
|
81 |
+
speaker_paths, text):
|
82 |
+
|
83 |
+
# create directories
|
84 |
+
for speaker_path in speaker_paths:
|
85 |
+
if not os.path.isdir(speaker_path):
|
86 |
+
os.mkdir(speaker_path)
|
87 |
+
|
88 |
+
# write text file
|
89 |
+
with open(target_text_file, 'w') as out:
|
90 |
+
out.write(text)
|
91 |
+
|
92 |
+
# copy wav file
|
93 |
+
try:
|
94 |
+
os.path.isfile(source_wav_file)
|
95 |
+
except:
|
96 |
+
raise IOError('{} does not exist'.format(source_wav_file))
|
97 |
+
|
98 |
+
cp_args = ['cp', source_wav_file, target_wav_file]
|
99 |
+
if not os.path.isfile(target_wav_file):
|
100 |
+
#print(' '.join(cp_args))
|
101 |
+
call(cp_args)
|
102 |
+
|
103 |
+
def convert_festcat(festcat_path, target_base_path):
|
104 |
+
meta_files = glob(f"{festcat_path}/*/*_train.txt")
|
105 |
+
for meta_file in meta_files:
|
106 |
+
speaker_name = meta_file.split(os.sep)[-2]
|
107 |
+
print(meta_file)
|
108 |
+
for line in open(meta_file).readlines():
|
109 |
+
if '[' not in line:
|
110 |
+
text_id, text = line.strip().split('|')
|
111 |
+
text.replace('¿','')
|
112 |
+
text.replace('¡','')
|
113 |
+
#speaker_id = '_'.join(text_id.split('_')[:3])
|
114 |
+
speaker_id = speaker_name
|
115 |
+
target_text_file = os.path.join(target_base_path, 'txt',
|
116 |
+
speaker_id, text_id+'.txt')
|
117 |
+
target_wav_file = os.path.join(target_base_path, 'wav',
|
118 |
+
speaker_id, text_id+'.wav')
|
119 |
+
source_wav_file = os.path.join(festcat_path, speaker_name,
|
120 |
+
'wavs', text_id+'.wav')
|
121 |
+
|
122 |
+
speaker_paths = [os.path.dirname(target_text_file),
|
123 |
+
os.path.dirname(target_wav_file)]
|
124 |
+
|
125 |
+
convert_meta(target_text_file, target_wav_file,
|
126 |
+
source_wav_file, speaker_paths, text)
|
127 |
+
else:
|
128 |
+
print('line: {} skipped'.format(line))
|
129 |
+
|
130 |
+
def convert_cv(common_voice_path, target_base_path):
|
131 |
+
meta_files = glob(f"{common_voice_path}/*.txt")
|
132 |
+
for meta_file in meta_files:
|
133 |
+
print(meta_file)
|
134 |
+
speaker_id = meta_file.split(os.sep)[-1].replace("ca_","").replace(".txt","")
|
135 |
+
for line in open(meta_file).readlines():
|
136 |
+
text_id, text = line.strip().split('|')
|
137 |
+
|
138 |
+
target_text_file = os.path.join(target_base_path, 'txt',
|
139 |
+
speaker_id, text_id+'.txt')
|
140 |
+
target_wav_file = os.path.join(target_base_path, 'wav',
|
141 |
+
speaker_id, text_id+'.wav')
|
142 |
+
source_wav_file = os.path.join(common_voice_path,
|
143 |
+
'wavs', text_id+'.wav')
|
144 |
+
|
145 |
+
speaker_paths = [os.path.dirname(target_text_file),
|
146 |
+
os.path.dirname(target_wav_file)]
|
147 |
+
|
148 |
+
convert_meta(target_text_file, target_wav_file,
|
149 |
+
source_wav_file, speaker_paths, text)
|
150 |
+
|
151 |
+
if __name__ == "__main__":
|
152 |
+
main()
|
data_processing/extract_festcat.py
ADDED
@@ -0,0 +1,139 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import re
|
3 |
+
import json
|
4 |
+
import subprocess
|
5 |
+
import argparse
|
6 |
+
import logging
|
7 |
+
|
8 |
+
logger = logging.getLogger(__name__)
|
9 |
+
|
10 |
+
def main():
|
11 |
+
my_parser = argparse.ArgumentParser()
|
12 |
+
my_parser.add_argument('--utterance-path',
|
13 |
+
metavar='path',
|
14 |
+
type=str,
|
15 |
+
help='the path to utterance file')
|
16 |
+
my_parser.add_argument('--wavs-path',
|
17 |
+
metavar='path',
|
18 |
+
type=str,
|
19 |
+
help='the path to wavs file')
|
20 |
+
my_parser.add_argument('--locutors',
|
21 |
+
metavar='N',
|
22 |
+
type=str,
|
23 |
+
help='list of speakers names/id separated with commas')
|
24 |
+
args = my_parser.parse_args()
|
25 |
+
locutors = args.locutors
|
26 |
+
locutors = locutors.replace(" ", "");
|
27 |
+
locutors = locutors.split(",")
|
28 |
+
utterance_path = args.utterance_path
|
29 |
+
wavs_path = args.wavs_path
|
30 |
+
|
31 |
+
for locutor in locutors:
|
32 |
+
# get durations
|
33 |
+
durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
|
34 |
+
aggregate_duration = 0
|
35 |
+
rejected_duration = 0
|
36 |
+
large_duration = 0
|
37 |
+
total_duration = 0
|
38 |
+
path = 'upc_ca_%s_utt/utt'%locutor
|
39 |
+
path = utterance_path + path
|
40 |
+
|
41 |
+
files = []
|
42 |
+
long_files = []
|
43 |
+
for filename in os.listdir(path):
|
44 |
+
sentence = get_sentence(os.path.join(path, filename))
|
45 |
+
audio_filename = filename.replace('.utt','.wav') # upc_ca_pep_203479.wav
|
46 |
+
if sentence:
|
47 |
+
target_path = 'upc_ca_%s_wav_22k_sil_pad'%locutor
|
48 |
+
target_path = wavs_path + target_path
|
49 |
+
source_filename = 'upc_ca_%s_wav_22k_sil/'%locutor+audio_filename
|
50 |
+
source_filename = wavs_path + source_filename
|
51 |
+
total_duration += durations[audio_filename]
|
52 |
+
|
53 |
+
if os.path.isfile(source_filename):
|
54 |
+
if durations[audio_filename] < 10.0:
|
55 |
+
aggregate_duration += durations[audio_filename]
|
56 |
+
files.append((os.path.join(target_path,audio_filename), sentence))
|
57 |
+
#subprocess.call(['cp',source_filename, target_filename])
|
58 |
+
else:
|
59 |
+
long_files.append((audio_filename, sentence))
|
60 |
+
large_duration += durations[audio_filename]
|
61 |
+
else:
|
62 |
+
print(audio_filename)
|
63 |
+
else:
|
64 |
+
rejected_duration += durations[audio_filename]
|
65 |
+
out(args, locutor, files)
|
66 |
+
out_long(args, locutor, long_files)
|
67 |
+
out_long_json(args, locutor, long_files)
|
68 |
+
print(locutor, aggregate_duration/3600, 'hours')
|
69 |
+
print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
|
70 |
+
print(locutor, 'rejected', rejected_duration/60, 'minutes')
|
71 |
+
print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
|
72 |
+
|
73 |
+
def get_durations_dict(filename):
|
74 |
+
durations = {}
|
75 |
+
|
76 |
+
for line in open(filename).readlines():
|
77 |
+
d = line.split(',')
|
78 |
+
durations[d[0].split('/')[-1]] = float(d[1])
|
79 |
+
return durations
|
80 |
+
|
81 |
+
def get_sentence(filename):
|
82 |
+
utt_all = open(filename, encoding = "ISO-8859-1").read()
|
83 |
+
m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
|
84 |
+
sentence = m.groups()[1]
|
85 |
+
# delete interword dashes
|
86 |
+
sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
|
87 |
+
if not re.search('\d', sentence):
|
88 |
+
return sentence
|
89 |
+
else:
|
90 |
+
#print(filename, sentence)
|
91 |
+
return None
|
92 |
+
|
93 |
+
def out(args, locutor, files):
|
94 |
+
|
95 |
+
outname_length = [('upc_%s_test.txt'%locutor,0),
|
96 |
+
('upc_%s_val.txt'%locutor,0),
|
97 |
+
('upc_%s_train.txt'%locutor,len(files))]
|
98 |
+
l_sum = sum([el[1] for el in outname_length])
|
99 |
+
if len(files) != l_sum:
|
100 |
+
msg = 'train vs test val distribution wrong: %i'%l_sum
|
101 |
+
raise ValueError('msg')
|
102 |
+
|
103 |
+
for fout, l in outname_length:
|
104 |
+
open((args.wavs_path + fout), mode= 'a').close()
|
105 |
+
logger.warning(f"fout: {fout}")
|
106 |
+
logger.warning(f"l: {l}")
|
107 |
+
logger.warning(f"Enable l: {len(files)-100}")
|
108 |
+
logger.warning(f"Files: {files}")
|
109 |
+
with open((args.wavs_path + fout), 'w') as out:
|
110 |
+
for i in range(l):
|
111 |
+
f, sentence = files.pop()
|
112 |
+
out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
|
113 |
+
|
114 |
+
def out_long(args, locutor, files):
|
115 |
+
outname = '%s_longsentences.csv'%locutor
|
116 |
+
outname_path = args.wavs_path + outname
|
117 |
+
open(outname_path, mode= 'a').close()
|
118 |
+
with open(outname_path, 'w') as out:
|
119 |
+
for audio, text in files:
|
120 |
+
out.write('%s,"%s"\n'%(audio, text))
|
121 |
+
|
122 |
+
def out_long_json(args, locutor, files):
|
123 |
+
outname = '%s_longsentences.json'%locutor
|
124 |
+
source = args.wavs_path +'upc_ca_%s_wav_22k_sil/'%locutor
|
125 |
+
outname_path = args.wavs_path + outname
|
126 |
+
open(outname_path, mode= 'a').close()
|
127 |
+
interventions = []
|
128 |
+
for audio, text in files:
|
129 |
+
intervention = {}
|
130 |
+
intervention['text'] = [(locutor, text)]
|
131 |
+
intervention['urls'] = [(locutor, os.path.join(source,audio))]
|
132 |
+
interventions.append(intervention)
|
133 |
+
|
134 |
+
with open(outname_path, 'w') as out:
|
135 |
+
json.dump({'session': interventions}, out, indent=2)
|
136 |
+
|
137 |
+
if __name__ == "__main__":
|
138 |
+
main()
|
139 |
+
|
data_processing/extract_google_tts.py
ADDED
@@ -0,0 +1,168 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
import os
|
2 |
+
import re
|
3 |
+
import json
|
4 |
+
import argparse
|
5 |
+
import logging
|
6 |
+
import csv
|
7 |
+
import numpy as np
|
8 |
+
|
9 |
+
logger = logging.getLogger(__name__)
|
10 |
+
|
11 |
+
def main():
|
12 |
+
my_parser = argparse.ArgumentParser()
|
13 |
+
my_parser.add_argument('--tsv-path',
|
14 |
+
metavar='path',
|
15 |
+
type=str,
|
16 |
+
help='the path to tsv file')
|
17 |
+
my_parser.add_argument('--wavs-path',
|
18 |
+
metavar='path',
|
19 |
+
type=str,
|
20 |
+
help='the path to wavs file')
|
21 |
+
my_parser.add_argument('--locutors',
|
22 |
+
metavar='N',
|
23 |
+
type=str,
|
24 |
+
help='list of speakers names/id separated with commas')
|
25 |
+
args = my_parser.parse_args()
|
26 |
+
locutors = args.locutors
|
27 |
+
locutors = locutors.replace(" ", "");
|
28 |
+
locutors = locutors.split(",")
|
29 |
+
tsv_path = args.tsv_path
|
30 |
+
wavs_path = args.wavs_path
|
31 |
+
|
32 |
+
for locutor in locutors:
|
33 |
+
# get durations
|
34 |
+
durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
|
35 |
+
aggregate_duration = 0
|
36 |
+
rejected_duration = 0
|
37 |
+
large_duration = 0
|
38 |
+
total_duration = 0
|
39 |
+
tsv_name = "line_index_%s.tsv"%locutor
|
40 |
+
tsv_path = tsv_path + tsv_name
|
41 |
+
|
42 |
+
tsv_file = open(tsv_path)
|
43 |
+
read_tsv = csv.reader(tsv_file, delimiter="\t")
|
44 |
+
files = []
|
45 |
+
long_files = []
|
46 |
+
for row in read_tsv:
|
47 |
+
audio_filename = row[0] + ".wav"
|
48 |
+
#logger.warning(f"Audio_filename {audio_filename}")
|
49 |
+
sentence = row[-1]
|
50 |
+
if sentence:
|
51 |
+
target_path = 'ca_es_%s_22k_sil_pad'%locutor
|
52 |
+
target_path = wavs_path + target_path
|
53 |
+
source_filename = 'ca_es_%s_22k_sil/'%locutor+audio_filename ###
|
54 |
+
source_filename = wavs_path + source_filename
|
55 |
+
#logger.warning(f"source_filename {source_filename}")
|
56 |
+
total_duration += durations[audio_filename]
|
57 |
+
if os.path.isfile(source_filename):
|
58 |
+
if durations[audio_filename] < 10.0:
|
59 |
+
aggregate_duration += durations[audio_filename]
|
60 |
+
files.append((os.path.join(target_path,audio_filename), sentence))
|
61 |
+
#subprocess.call(['cp',source_filename, target_filename])
|
62 |
+
else:
|
63 |
+
long_files.append((audio_filename, sentence))
|
64 |
+
large_duration += durations[audio_filename]
|
65 |
+
else:
|
66 |
+
print(audio_filename)
|
67 |
+
else:
|
68 |
+
rejected_duration += durations[audio_filename]
|
69 |
+
|
70 |
+
speakers_id = find_speakers_id(wavs_path + '%s_sil_stats.csv'%locutor)
|
71 |
+
for id in speakers_id:
|
72 |
+
speaker_file = files_spliter(files = files, speaker_id = id)
|
73 |
+
if len(speaker_file) == 0:
|
74 |
+
continue
|
75 |
+
else:
|
76 |
+
out(args, speaker_id = id, files = speaker_file)
|
77 |
+
#print(f"mv {wavs_path}ca_{id}_test.txt {wavs_path}{locutor}")
|
78 |
+
#os.system(f"mv {wavs_path}ca_{id}_test.txt {wavs_path}{locutor}")
|
79 |
+
#os.system(f"mv {wavs_path}ca_{id}_val.txt {wavs_path}{locutor}")
|
80 |
+
#os.system(f"mv {wavs_path}ca_{id}_train.txt {wavs_path}{locutor}")
|
81 |
+
#out(args, locutor, files)
|
82 |
+
out_long(args, locutor, long_files)
|
83 |
+
out_long_json(args, locutor, long_files)
|
84 |
+
print(locutor, aggregate_duration/3600, 'hours')
|
85 |
+
print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
|
86 |
+
print(locutor, 'rejected', rejected_duration/60, 'minutes')
|
87 |
+
print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
|
88 |
+
|
89 |
+
def get_durations_dict(filename):
|
90 |
+
durations = {}
|
91 |
+
for line in open(filename).readlines():
|
92 |
+
d = line.split(',')
|
93 |
+
durations[d[0].split('/')[-1]] = float(d[1])
|
94 |
+
return durations
|
95 |
+
|
96 |
+
def get_sentence(filename):
|
97 |
+
utt_all = open(filename, encoding = "ISO-8859-1").read()
|
98 |
+
m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
|
99 |
+
sentence = m.groups()[1]
|
100 |
+
# delete interword dashes
|
101 |
+
sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
|
102 |
+
if not re.search('\d', sentence):
|
103 |
+
return sentence
|
104 |
+
else:
|
105 |
+
print(filename, sentence)
|
106 |
+
return None
|
107 |
+
|
108 |
+
def out(args, speaker_id, files):
|
109 |
+
outname_length = [('ca_%s_test.txt'%speaker_id,0),
|
110 |
+
('ca_%s_val.txt'%speaker_id,0),
|
111 |
+
('ca_%s_train.txt'%speaker_id,len(files))]
|
112 |
+
l_sum = sum([el[1] for el in outname_length])
|
113 |
+
if len(files) != l_sum:
|
114 |
+
msg = 'train vs test val distribution wrong: %i'%l_sum
|
115 |
+
raise ValueError('msg')
|
116 |
+
|
117 |
+
for fout, l in outname_length:
|
118 |
+
open((args.wavs_path + fout), mode= 'a').close()
|
119 |
+
with open((args.wavs_path + fout), 'w') as out:
|
120 |
+
for i in range(l):
|
121 |
+
f, sentence = files.pop()
|
122 |
+
out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
|
123 |
+
print(len(files))
|
124 |
+
|
125 |
+
def out_long(args, locutor, files):
|
126 |
+
outname = '%s_longsentences.csv'%locutor
|
127 |
+
outname_path = args.wavs_path + outname
|
128 |
+
open(outname_path, mode= 'a').close()
|
129 |
+
with open(outname_path, 'w') as out:
|
130 |
+
for audio, text in files:
|
131 |
+
out.write('%s,"%s"\n'%(audio, text))
|
132 |
+
|
133 |
+
def out_long_json(args, locutor, files):
|
134 |
+
outname = '%s_longsentences.json'%locutor
|
135 |
+
source = args.wavs_path +'ca_es_%s_22k_sil/'%locutor
|
136 |
+
outname_path = args.wavs_path + outname
|
137 |
+
open(outname_path, mode= 'a').close()
|
138 |
+
interventions = []
|
139 |
+
for audio, text in files:
|
140 |
+
intervention = {}
|
141 |
+
intervention['text'] = [(locutor, text)]
|
142 |
+
intervention['urls'] = [(locutor, os.path.join(source,audio))]
|
143 |
+
interventions.append(intervention)
|
144 |
+
|
145 |
+
with open(outname_path, 'w') as out:
|
146 |
+
json.dump({'session': interventions}, out, indent=2)
|
147 |
+
|
148 |
+
def find_speakers_id(path_tsv):
|
149 |
+
durations = {}
|
150 |
+
for line in open(path_tsv).readlines():
|
151 |
+
d = line.split(',')
|
152 |
+
durations[d[0].split('/')[-1]] = float(d[1])
|
153 |
+
keysList = list(durations.keys())
|
154 |
+
for index in range(len(keysList)):
|
155 |
+
keysList[index] = keysList[index].split("_")[1]
|
156 |
+
keysList = np.ndarray.tolist(np.unique(np.array(keysList)))
|
157 |
+
return keysList
|
158 |
+
|
159 |
+
def files_spliter(files, speaker_id):
|
160 |
+
out_file = []
|
161 |
+
for element in files:
|
162 |
+
if element[0].split("/")[-1].split("_")[1] == speaker_id:
|
163 |
+
out_file.append(element)
|
164 |
+
return out_file
|
165 |
+
|
166 |
+
if __name__ == "__main__":
|
167 |
+
main()
|
168 |
+
|
data_processing/festcat_processing_test.sh
ADDED
@@ -0,0 +1,152 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/sh
|
2 |
+
|
3 |
+
|
4 |
+
export FINAL_PATH=$1
|
5 |
+
export SOURCE_PATH=$2
|
6 |
+
export EXTRACT_PATH=$3
|
7 |
+
|
8 |
+
|
9 |
+
module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
|
10 |
+
|
11 |
+
for name in bet eli eva jan mar ona pau pep pol teo uri
|
12 |
+
do
|
13 |
+
echo "Processing $name data"
|
14 |
+
export SPEAKER_NAME=$name
|
15 |
+
export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
|
16 |
+
export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
|
17 |
+
|
18 |
+
if [ -d "${FINAL_PATH}" ]; then
|
19 |
+
### Take action if $DIR exists ###
|
20 |
+
echo "Path ${FINAL_PATH} already created"
|
21 |
+
else
|
22 |
+
### Control will jump here if $DIR does NOT exists ###
|
23 |
+
mkdir ${FINAL_PATH}
|
24 |
+
echo "Crating: ${FINAL_PATH} "
|
25 |
+
fi
|
26 |
+
|
27 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
|
28 |
+
### Take action if $DIR exists ###
|
29 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
|
30 |
+
else
|
31 |
+
### Control will jump here if $DIR does NOT exists ###
|
32 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}
|
33 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
|
34 |
+
fi
|
35 |
+
|
36 |
+
|
37 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav" ]; then
|
38 |
+
### Take action if $DIR exists ###
|
39 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav already created"
|
40 |
+
else
|
41 |
+
### Control will jump here if $DIR does NOT exists ###
|
42 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
|
43 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav "
|
44 |
+
fi
|
45 |
+
|
46 |
+
|
47 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/)" ]; then
|
48 |
+
i=1
|
49 |
+
sp="/-\|"
|
50 |
+
for f in ${SOURCE_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_raw/recordings/*.raw; do
|
51 |
+
t=${f%.raw}.wav; g=${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/${t##*/}; sox -t raw -r 48k -e signed -b 16 -c 1 $f $g;
|
52 |
+
printf "\r Converiting .raw audios to .wav ${sp:i++%${#sp}:1}"
|
53 |
+
sleep 0.05
|
54 |
+
done
|
55 |
+
else
|
56 |
+
echo "Already converted to .wav"
|
57 |
+
fi
|
58 |
+
|
59 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k" ]; then
|
60 |
+
### Take action if $DIR exists ###
|
61 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k already created"
|
62 |
+
else
|
63 |
+
### Control will jump here if $DIR does NOT exists ###
|
64 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
|
65 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k "
|
66 |
+
fi
|
67 |
+
|
68 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/)" ]; then
|
69 |
+
i=1
|
70 |
+
sp="/-\|"
|
71 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/*.wav; do
|
72 |
+
t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/$t -v error < /dev/null;
|
73 |
+
printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
|
74 |
+
sleep 0.05
|
75 |
+
done;
|
76 |
+
else
|
77 |
+
echo "Already converted to 22kHz file"
|
78 |
+
fi
|
79 |
+
|
80 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil" ]; then
|
81 |
+
### Take action if $DIR exists ###
|
82 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil already created"
|
83 |
+
else
|
84 |
+
### Control will jump here if $DIR does NOT exists ###
|
85 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
|
86 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil "
|
87 |
+
fi
|
88 |
+
|
89 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/)" ]; then
|
90 |
+
i=1
|
91 |
+
sp="/-\|"
|
92 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/*.wav; do
|
93 |
+
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
|
94 |
+
printf "\r Filtering silence ${sp:i++%${#sp}:1}"
|
95 |
+
sleep 0.05
|
96 |
+
done
|
97 |
+
else
|
98 |
+
echo "Silence already eliminated"
|
99 |
+
fi
|
100 |
+
|
101 |
+
if [ -f "${OUTPUT_CSV}" ]; then
|
102 |
+
### Take action if $DIR exists ###
|
103 |
+
echo "${OUTPUT_CSV} already exists!"
|
104 |
+
else
|
105 |
+
### Control will jump here if $DIR does NOT exists ###
|
106 |
+
echo "Crating ${OUTPUT_CSV}"
|
107 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
|
108 |
+
d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
|
109 |
+
echo $f,$d;
|
110 |
+
done >> ${OUTPUT_CSV}
|
111 |
+
fi
|
112 |
+
|
113 |
+
if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/upc_${SPEAKER_NAME}_train.txt" ]; then
|
114 |
+
### Take action if $DIR exists ###
|
115 |
+
echo "Splits already created!"
|
116 |
+
else
|
117 |
+
### Control will jump here if $DIR does NOT exists ###
|
118 |
+
echo "Crating splits..."
|
119 |
+
python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --utterance-path ${UTTERANCE_PATH} --locutors ${SPEAKER_NAME}
|
120 |
+
fi
|
121 |
+
|
122 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad" ]; then
|
123 |
+
### Take action if $DIR exists ###
|
124 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad already created"
|
125 |
+
else
|
126 |
+
### Control will jump here if $DIR does NOT exists ###
|
127 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
|
128 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
|
129 |
+
fi
|
130 |
+
|
131 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
|
132 |
+
i=1
|
133 |
+
sp="/-\|"
|
134 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
|
135 |
+
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
|
136 |
+
printf "\r Adding pad ${sp:i++%${#sp}:1}"
|
137 |
+
sleep 0.05
|
138 |
+
done
|
139 |
+
else
|
140 |
+
echo "Pad already added!"
|
141 |
+
fi
|
142 |
+
|
143 |
+
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
|
144 |
+
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
|
145 |
+
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
|
146 |
+
|
147 |
+
done
|
148 |
+
echo "Done!"
|
149 |
+
|
150 |
+
|
151 |
+
|
152 |
+
|
data_processing/google_tts_processing_test.sh
ADDED
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/sh
|
2 |
+
|
3 |
+
|
4 |
+
export FINAL_PATH=$1
|
5 |
+
export SOURCE_PATH=$2
|
6 |
+
export EXTRACT_PATH=$3
|
7 |
+
|
8 |
+
|
9 |
+
|
10 |
+
module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
|
11 |
+
|
12 |
+
for name in male female
|
13 |
+
do
|
14 |
+
export SPEAKER_NAME=$name
|
15 |
+
export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
|
16 |
+
export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
|
17 |
+
|
18 |
+
if [ -d "${FINAL_PATH}" ]; then
|
19 |
+
### Take action if $DIR exists ###
|
20 |
+
echo "Path ${FINAL_PATH} already created"
|
21 |
+
else
|
22 |
+
### Control will jump here if $DIR does NOT exists ###
|
23 |
+
mkdir ${FINAL_PATH}
|
24 |
+
echo "Crating: ${FINAL_PATH} "
|
25 |
+
fi
|
26 |
+
|
27 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
|
28 |
+
### Take action if $DIR exists ###
|
29 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
|
30 |
+
else
|
31 |
+
### Control will jump here if $DIR does NOT exists ###
|
32 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}
|
33 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
|
34 |
+
fi
|
35 |
+
|
36 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k" ]; then
|
37 |
+
### Take action if $DIR exists ###
|
38 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k already created"
|
39 |
+
else
|
40 |
+
### Control will jump here if $DIR does NOT exists ###
|
41 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
|
42 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k "
|
43 |
+
fi
|
44 |
+
|
45 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/)" ]; then
|
46 |
+
i=1
|
47 |
+
sp="/-\|"
|
48 |
+
for f in ${SOURCE_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}/*.wav; do
|
49 |
+
t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/$t -v error < /dev/null;
|
50 |
+
printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
|
51 |
+
sleep 0.05
|
52 |
+
done;
|
53 |
+
else
|
54 |
+
echo "Already converted to 22kHz file"
|
55 |
+
fi
|
56 |
+
|
57 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil" ]; then
|
58 |
+
### Take action if $DIR exists ###
|
59 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil already created"
|
60 |
+
else
|
61 |
+
### Control will jump here if $DIR does NOT exists ###
|
62 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
|
63 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil "
|
64 |
+
fi
|
65 |
+
|
66 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/)" ]; then
|
67 |
+
i=1
|
68 |
+
sp="/-\|"
|
69 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/*.wav; do
|
70 |
+
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
|
71 |
+
printf "\r Filtering silence ${sp:i++%${#sp}:1}"
|
72 |
+
sleep 0.05
|
73 |
+
done
|
74 |
+
else
|
75 |
+
echo "Silence has already been filtered!"
|
76 |
+
fi
|
77 |
+
|
78 |
+
if [ -f "${OUTPUT_CSV}" ]; then
|
79 |
+
### Take action if $DIR exists ###
|
80 |
+
echo "${OUTPUT_CSV} already exists!"
|
81 |
+
else
|
82 |
+
### Control will jump here if $DIR does NOT exists ###
|
83 |
+
echo "Crating ${OUTPUT_CSV}"
|
84 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
|
85 |
+
d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
|
86 |
+
echo $f,$d;
|
87 |
+
done >> ${OUTPUT_CSV}
|
88 |
+
fi
|
89 |
+
|
90 |
+
if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/ca_01591_train.txt" ]; then
|
91 |
+
### Take action if $DIR exists ###
|
92 |
+
echo "Splits already created!"
|
93 |
+
else
|
94 |
+
### Control will jump here if $DIR does NOT exists ###
|
95 |
+
echo "Crating splits..."
|
96 |
+
python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --tsv-path ${SOURCE_PATH}/${SPEAKER_NAME}/ --locutors ${SPEAKER_NAME}
|
97 |
+
fi
|
98 |
+
|
99 |
+
if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/wavs" ]; then
|
100 |
+
### Take action if $DIR exists ###
|
101 |
+
echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
|
102 |
+
else
|
103 |
+
### Control will jump here if $DIR does NOT exists ###
|
104 |
+
mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
|
105 |
+
echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
|
106 |
+
fi
|
107 |
+
|
108 |
+
if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
|
109 |
+
i=1
|
110 |
+
sp="/-\|"
|
111 |
+
for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
|
112 |
+
t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
|
113 |
+
printf "\r Adding pad ${sp:i++%${#sp}:1}"
|
114 |
+
sleep 0.05
|
115 |
+
done
|
116 |
+
else
|
117 |
+
echo "Pad already added!"
|
118 |
+
fi
|
119 |
+
|
120 |
+
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
|
121 |
+
rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
|
122 |
+
|
123 |
+
done
|
124 |
+
echo "Done!"
|
data_processing/process_data.sh
ADDED
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
#!/bin/bash
|
2 |
+
|
3 |
+
### Festcat variables ###
|
4 |
+
export PATH_TO_FESTCAT_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/festcat_processing_test.sh'
|
5 |
+
export PATH_TO_FESTCAT_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_festcat.py'
|
6 |
+
export PATH_TO_FESTCAT_DATA='/gpfs/scratch/bsc88/bsc88858/festcat/'
|
7 |
+
export FESTCAT_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/festcat_processed'
|
8 |
+
|
9 |
+
### Google_tts variables ###
|
10 |
+
export PATH_TO_GOOGLE_TTS_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/google_tts_processing_test.sh'
|
11 |
+
export PATH_TO_GOOGLE_TTS_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_google_tts.py'
|
12 |
+
export PATH_TO_GOOGLE_TTS_DATA='/gpfs/scratch/bsc88/bsc88858/google_tts'
|
13 |
+
export GOOGLE_TTS_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/google_tts_processed'
|
14 |
+
|
15 |
+
### General variables ###
|
16 |
+
export VCTK_FORMATER_PATH='/gpfs/scratch/bsc88/bsc88858/data_processing/ca_multi2vckt.py'
|
17 |
+
export FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/multispeaker_ca_test/'
|
18 |
+
|
19 |
+
|
20 |
+
if [ -d "${FESTCAT_FINAL_PATH}" ]; then
|
21 |
+
### Take action if $DIR exists ###
|
22 |
+
echo "Path ${FESTCAT_FINAL_PATH} already exists"
|
23 |
+
else
|
24 |
+
### Control will jump here if $DIR does NOT exists ###
|
25 |
+
if [ -d "${PATH_TO_FESTCAT_DATA}" ]; then
|
26 |
+
source ${PATH_TO_FESTCAT_SHELL} ${FESTCAT_FINAL_PATH} ${PATH_TO_FESTCAT_DATA} ${PATH_TO_FESTCAT_PY}
|
27 |
+
else
|
28 |
+
echo "Fescat data not found!"
|
29 |
+
fi
|
30 |
+
fi
|
31 |
+
|
32 |
+
if [ -d "${GOOGLE_TTS_FINAL_PATH}" ]; then
|
33 |
+
### Take action if $DIR exists ###
|
34 |
+
echo "Path ${GOOGLE_TTS_FINAL_PATH} already exists"
|
35 |
+
else
|
36 |
+
### Control will jump here if $DIR does NOT exists ###
|
37 |
+
if [ -d "${PATH_TO_GOOGLE_TTS_DATA}" ]; then
|
38 |
+
source ${PATH_TO_GOOGLE_TTS_SHELL} ${GOOGLE_TTS_FINAL_PATH} ${PATH_TO_GOOGLE_TTS_DATA} ${PATH_TO_GOOGLE_TTS_PY}
|
39 |
+
else
|
40 |
+
echo "Google TTS data not found!"
|
41 |
+
fi
|
42 |
+
fi
|
43 |
+
|
44 |
+
if [ -d "${FINAL_PATH}" ]; then
|
45 |
+
### Take action if $DIR exists ###
|
46 |
+
echo "Path ${FINAL_PATH} already created"
|
47 |
+
else
|
48 |
+
### Control will jump here if $DIR does NOT exists ###
|
49 |
+
mkdir ${FINAL_PATH}
|
50 |
+
mkdir ${FINAL_PATH}/txt/
|
51 |
+
mkdir ${FINAL_PATH}/wav/
|
52 |
+
echo "Crating: ${FINAL_PATH}"
|
53 |
+
python ${VCTK_FORMATER_PATH} --google-path ${GOOGLE_TTS_FINAL_PATH} --festcat-path ${FESTCAT_FINAL_PATH} --final-path ${FINAL_PATH}
|
54 |
+
fi
|
55 |
+
|
56 |
+
echo "Done!"
|
model/best_model.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b15fa7d2052bada1cf421e49d2d03b00e95b49fcd0e42b7af1d92da2880cdecc
|
3 |
+
size 1038659133
|
model/config.json
ADDED
@@ -0,0 +1,262 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"output_path": "/gpfs/projects/bsc88/speech/tts/TTS_v0.8.0/recipes/multispeaker/experiments_from_previous",
|
3 |
+
"logger_uri": null,
|
4 |
+
"run_name": "multispeaker_vits_ca_1e4_1e4_32",
|
5 |
+
"project_name": null,
|
6 |
+
"run_description": "\ud83d\udc38Coqui trainer run.",
|
7 |
+
"print_step": 25,
|
8 |
+
"plot_step": 100,
|
9 |
+
"model_param_stats": false,
|
10 |
+
"wandb_entity": null,
|
11 |
+
"dashboard_logger": "tensorboard",
|
12 |
+
"log_model_step": 1000,
|
13 |
+
"save_step": 1000,
|
14 |
+
"save_n_checkpoints": 5,
|
15 |
+
"save_checkpoints": true,
|
16 |
+
"save_all_best": true,
|
17 |
+
"save_best_after": 10000,
|
18 |
+
"target_loss": null,
|
19 |
+
"print_eval": true,
|
20 |
+
"test_delay_epochs": -1,
|
21 |
+
"run_eval": true,
|
22 |
+
"run_eval_steps": null,
|
23 |
+
"distributed_backend": "nccl",
|
24 |
+
"distributed_url": "tcp://localhost:54321",
|
25 |
+
"mixed_precision": false,
|
26 |
+
"epochs": 1000,
|
27 |
+
"batch_size": 16,
|
28 |
+
"eval_batch_size": 8,
|
29 |
+
"grad_clip": [
|
30 |
+
1000.0,
|
31 |
+
1000.0
|
32 |
+
],
|
33 |
+
"scheduler_after_epoch": true,
|
34 |
+
"lr": 0.001,
|
35 |
+
"optimizer": "AdamW",
|
36 |
+
"optimizer_params": {
|
37 |
+
"betas": [
|
38 |
+
0.8,
|
39 |
+
0.99
|
40 |
+
],
|
41 |
+
"eps": 1e-09,
|
42 |
+
"weight_decay": 0.01
|
43 |
+
},
|
44 |
+
"lr_scheduler": "",
|
45 |
+
"lr_scheduler_params": null,
|
46 |
+
"use_grad_scaler": false,
|
47 |
+
"cudnn_enable": true,
|
48 |
+
"cudnn_deterministic": false,
|
49 |
+
"cudnn_benchmark": false,
|
50 |
+
"training_seed": 54321,
|
51 |
+
"model": "vits",
|
52 |
+
"num_loader_workers": 4,
|
53 |
+
"num_eval_loader_workers": 4,
|
54 |
+
"use_noise_augment": false,
|
55 |
+
"audio": {
|
56 |
+
"fft_size": 1024,
|
57 |
+
"sample_rate": 22050,
|
58 |
+
"win_length": 1024,
|
59 |
+
"hop_length": 256,
|
60 |
+
"num_mels": 80,
|
61 |
+
"mel_fmin": 0,
|
62 |
+
"mel_fmax": null
|
63 |
+
},
|
64 |
+
"use_phonemes": true,
|
65 |
+
"phonemizer": "espeak",
|
66 |
+
"phoneme_language": "ca",
|
67 |
+
"compute_input_seq_cache": true,
|
68 |
+
"text_cleaner": "multilingual_cleaners",
|
69 |
+
"enable_eos_bos_chars": false,
|
70 |
+
"test_sentences_file": "",
|
71 |
+
"phoneme_cache_path": "/gpfs/projects/bsc88/speech/tts/TTS_v0.8.0/recipes/multispeaker/phoneme_cache",
|
72 |
+
"characters": {
|
73 |
+
"characters_class": "TTS.tts.utils.text.characters.IPAPhonemes",
|
74 |
+
"vocab_dict": null,
|
75 |
+
"pad": "<PAD>",
|
76 |
+
"eos": "<EOS>",
|
77 |
+
"bos": "<BOS>",
|
78 |
+
"blank": "<BLNK>",
|
79 |
+
"characters": "iy\u0268\u0289\u026fu\u026a\u028f\u028ae\u00f8\u0258\u0259\u0275\u0264o\u025b\u0153\u025c\u025e\u028c\u0254\u00e6\u0250a\u0276\u0251\u0252\u1d7b\u0298\u0253\u01c0\u0257\u01c3\u0284\u01c2\u0260\u01c1\u029bpbtd\u0288\u0256c\u025fk\u0261q\u0262\u0294\u0274\u014b\u0272\u0273n\u0271m\u0299r\u0280\u2c71\u027e\u027d\u0278\u03b2fv\u03b8\u00f0sz\u0283\u0292\u0282\u0290\u00e7\u029dx\u0263\u03c7\u0281\u0127\u0295h\u0266\u026c\u026e\u028b\u0279\u027bj\u0270l\u026d\u028e\u029f\u02c8\u02cc\u02d0\u02d1\u028dw\u0265\u029c\u02a2\u02a1\u0255\u0291\u027a\u0267\u02b2\u025a\u02de\u026b",
|
80 |
+
"punctuations": "!'(),-.:;? ",
|
81 |
+
"phonemes": null,
|
82 |
+
"is_unique": false,
|
83 |
+
"is_sorted": true
|
84 |
+
},
|
85 |
+
"add_blank": true,
|
86 |
+
"batch_group_size": 5,
|
87 |
+
"loss_masking": null,
|
88 |
+
"min_audio_len": 1,
|
89 |
+
"max_audio_len": Infinity,
|
90 |
+
"min_text_len": 1,
|
91 |
+
"max_text_len": 325,
|
92 |
+
"compute_f0": false,
|
93 |
+
"compute_linear_spec": true,
|
94 |
+
"precompute_num_workers": 0,
|
95 |
+
"start_by_longest": false,
|
96 |
+
"datasets": [
|
97 |
+
{
|
98 |
+
"formatter": "vctk_old",
|
99 |
+
"dataset_name": "vctk_old",
|
100 |
+
"path": "/gpfs/scratch/bsc88/bsc88474/data/multispeaker_ca",
|
101 |
+
"meta_file_train": "",
|
102 |
+
"ignored_speakers": [
|
103 |
+
"uri",
|
104 |
+
"09796",
|
105 |
+
"05450"
|
106 |
+
],
|
107 |
+
"language": "ca",
|
108 |
+
"meta_file_val": "",
|
109 |
+
"meta_file_attn_mask": ""
|
110 |
+
}
|
111 |
+
],
|
112 |
+
"test_sentences": [
|
113 |
+
[
|
114 |
+
"Per exemple, dels nostres bancs que inverteixen en armament de les nostres empreses."
|
115 |
+
],
|
116 |
+
[
|
117 |
+
"Preguntin-se si aix\u00f2 era necessari."
|
118 |
+
],
|
119 |
+
[
|
120 |
+
"La suposada ocultaci\u00f3 dels informes que advertien de risc s\u00edsmic."
|
121 |
+
],
|
122 |
+
[
|
123 |
+
"\u00c9s de 633 milions d'euros quan es far\u00e0 la publicaci\u00f3 detallada."
|
124 |
+
]
|
125 |
+
],
|
126 |
+
"eval_split_max_size": null,
|
127 |
+
"eval_split_size": 0.01,
|
128 |
+
"use_speaker_weighted_sampler": false,
|
129 |
+
"speaker_weighted_sampler_alpha": 1.0,
|
130 |
+
"use_language_weighted_sampler": false,
|
131 |
+
"language_weighted_sampler_alpha": 1.0,
|
132 |
+
"use_length_weighted_sampler": false,
|
133 |
+
"length_weighted_sampler_alpha": 1.0,
|
134 |
+
"model_args": {
|
135 |
+
"num_chars": 131,
|
136 |
+
"out_channels": 513,
|
137 |
+
"spec_segment_size": 32,
|
138 |
+
"hidden_channels": 192,
|
139 |
+
"hidden_channels_ffn_text_encoder": 768,
|
140 |
+
"num_heads_text_encoder": 2,
|
141 |
+
"num_layers_text_encoder": 6,
|
142 |
+
"kernel_size_text_encoder": 3,
|
143 |
+
"dropout_p_text_encoder": 0.1,
|
144 |
+
"dropout_p_duration_predictor": 0.5,
|
145 |
+
"kernel_size_posterior_encoder": 5,
|
146 |
+
"dilation_rate_posterior_encoder": 1,
|
147 |
+
"num_layers_posterior_encoder": 16,
|
148 |
+
"kernel_size_flow": 5,
|
149 |
+
"dilation_rate_flow": 1,
|
150 |
+
"num_layers_flow": 4,
|
151 |
+
"resblock_type_decoder": "1",
|
152 |
+
"resblock_kernel_sizes_decoder": [
|
153 |
+
3,
|
154 |
+
7,
|
155 |
+
11
|
156 |
+
],
|
157 |
+
"resblock_dilation_sizes_decoder": [
|
158 |
+
[
|
159 |
+
1,
|
160 |
+
3,
|
161 |
+
5
|
162 |
+
],
|
163 |
+
[
|
164 |
+
1,
|
165 |
+
3,
|
166 |
+
5
|
167 |
+
],
|
168 |
+
[
|
169 |
+
1,
|
170 |
+
3,
|
171 |
+
5
|
172 |
+
]
|
173 |
+
],
|
174 |
+
"upsample_rates_decoder": [
|
175 |
+
8,
|
176 |
+
8,
|
177 |
+
2,
|
178 |
+
2
|
179 |
+
],
|
180 |
+
"upsample_initial_channel_decoder": 512,
|
181 |
+
"upsample_kernel_sizes_decoder": [
|
182 |
+
16,
|
183 |
+
16,
|
184 |
+
4,
|
185 |
+
4
|
186 |
+
],
|
187 |
+
"periods_multi_period_discriminator": [
|
188 |
+
2,
|
189 |
+
3,
|
190 |
+
5,
|
191 |
+
7,
|
192 |
+
11
|
193 |
+
],
|
194 |
+
"use_sdp": true,
|
195 |
+
"noise_scale": 1.0,
|
196 |
+
"inference_noise_scale": 0.667,
|
197 |
+
"length_scale": 1.0,
|
198 |
+
"noise_scale_dp": 1.0,
|
199 |
+
"inference_noise_scale_dp": 1.0,
|
200 |
+
"max_inference_len": null,
|
201 |
+
"init_discriminator": true,
|
202 |
+
"use_spectral_norm_disriminator": false,
|
203 |
+
"use_speaker_embedding": true,
|
204 |
+
"num_speakers": 257,
|
205 |
+
"speakers_file": "/home/user/app/speakers.pth",
|
206 |
+
"d_vector_file": null,
|
207 |
+
"speaker_embedding_channels": 256,
|
208 |
+
"use_d_vector_file": false,
|
209 |
+
"d_vector_dim": 0,
|
210 |
+
"detach_dp_input": true,
|
211 |
+
"use_language_embedding": false,
|
212 |
+
"embedded_language_dim": 4,
|
213 |
+
"num_languages": 0,
|
214 |
+
"language_ids_file": null,
|
215 |
+
"use_speaker_encoder_as_loss": false,
|
216 |
+
"speaker_encoder_config_path": "",
|
217 |
+
"speaker_encoder_model_path": "",
|
218 |
+
"condition_dp_on_speaker": true,
|
219 |
+
"freeze_encoder": false,
|
220 |
+
"freeze_DP": false,
|
221 |
+
"freeze_PE": false,
|
222 |
+
"freeze_flow_decoder": false,
|
223 |
+
"freeze_waveform_decoder": false,
|
224 |
+
"encoder_sample_rate": null,
|
225 |
+
"interpolate_z": true,
|
226 |
+
"reinit_DP": false,
|
227 |
+
"reinit_text_encoder": false
|
228 |
+
},
|
229 |
+
"lr_gen": 0.0001,
|
230 |
+
"lr_disc": 0.0001,
|
231 |
+
"lr_scheduler_gen": "ExponentialLR",
|
232 |
+
"lr_scheduler_gen_params": {
|
233 |
+
"gamma": 0.999875,
|
234 |
+
"last_epoch": -1
|
235 |
+
},
|
236 |
+
"lr_scheduler_disc": "ExponentialLR",
|
237 |
+
"lr_scheduler_disc_params": {
|
238 |
+
"gamma": 0.999875,
|
239 |
+
"last_epoch": -1
|
240 |
+
},
|
241 |
+
"kl_loss_alpha": 1.0,
|
242 |
+
"disc_loss_alpha": 1.0,
|
243 |
+
"gen_loss_alpha": 1.0,
|
244 |
+
"feat_loss_alpha": 1.0,
|
245 |
+
"mel_loss_alpha": 45.0,
|
246 |
+
"dur_loss_alpha": 1.0,
|
247 |
+
"speaker_encoder_loss_alpha": 1.0,
|
248 |
+
"return_wav": true,
|
249 |
+
"use_weighted_sampler": false,
|
250 |
+
"weighted_sampler_attrs": null,
|
251 |
+
"weighted_sampler_multipliers": null,
|
252 |
+
"r": 1,
|
253 |
+
"num_speakers": 257,
|
254 |
+
"use_speaker_embedding": true,
|
255 |
+
"speakers_file": "/home/user/app/speakers.pth",
|
256 |
+
"speaker_embedding_channels": 256,
|
257 |
+
"language_ids_file": null,
|
258 |
+
"use_language_embedding": false,
|
259 |
+
"use_d_vector_file": false,
|
260 |
+
"d_vector_file": null,
|
261 |
+
"d_vector_dim": 0
|
262 |
+
}
|
model/speakers.pth
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:6dacda0b8dd3e111c5072f8f33c08b4a29b92ac79aaf22ceca912d01e7deb905
|
3 |
+
size 30191
|