PyTorch
Catalan
TTS
audio
synthesis
VITS
speech
coqui.ai
Gerard Muniesa commited on
Commit
a5fbdd4
1 Parent(s): 640e286

[NEW] Add model Card, model files and data preprocessing files

Browse files
README.md CHANGED
@@ -1,3 +1,110 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Aina Project's Catalan multi-speaker text-to-speech model
2
+ ## Model description
3
+
4
+ This model was trained from scratch using the [Coqui TTS](https://github.com/coqui-ai/TTS) toolkit on a combination of 3 datasets: [Festcat](http://festcat.talp.cat/devel.php), [OpenSLR](http://openslr.org/69/) and [Common Voice](https://commonvoice.mozilla.org/ca). For the training, 101460 utterances consisting of 257 speakers were used, which corresponds to nearly 138 hours of speech. [Here](https://huggingface.co/spaces/projecte-aina/VITS_ca_multispeaker) you can find a demo of the model.
5
+
6
+ ## Intended uses and limitations
7
+
8
+ You can use this model to generate synthetic speech in Catalan with different voices.
9
+
10
+ ## How to use
11
+ ### Usage
12
+
13
+ Requiered libraries:
14
+
15
+ ```bash
16
+ pip install git+https://github.com/coqui-ai/TTS@dev#egg=TTS
17
+ ```
18
+
19
+ Synthesize a speech using python:
20
+
21
+ ```bash
22
+ import tempfile
23
+ import gradio as gr
24
+ import numpy as np
25
+ import os
26
+ import json
27
+
28
+ from typing import Optional
29
+ from TTS.config import load_config
30
+ from TTS.utils.manage import ModelManager
31
+ from TTS.utils.synthesizer import Synthesizer
32
+
33
+ model_path = # Absolute path to the model checkpoint.pth
34
+ config_path = # Absolute path to the model config.json
35
+ speakers_file_path = # Absolute path to speakers.pth file
36
+
37
+ text = "Text to synthetize"
38
+ speaker_idx = "Speaker ID"
39
+
40
+ synthesizer = Synthesizer(
41
+ model_path, config_path, speakers_file_path, None, None, None,
42
+ )
43
+ wavs = synthesizer.tts(text, speaker_idx)
44
+ ```
45
+
46
+
47
+ ## Training
48
+ ### Training Procedure
49
+ ### Data preparation
50
+ The data has been processed using the script process_data.py, which reduces the sampling frequency of the audios, eliminates silences, adds padding and structures the data in the format accepted by the framework. You can find more information here.
51
+
52
+ ### Hyperparameter
53
+
54
+ The model is based on VITS proposed by [Kim et al](https://arxiv.org/abs/2106.06103). The following hyperparameters were set in the coqui framework.
55
+
56
+ | Hyperparameter | Value |
57
+ |------------------------------------|----------------------------------|
58
+ | Model | vits |
59
+ | Batch Size | 16 |
60
+ | Eval Batch Size | 8 |
61
+ | Mixed Precision | false |
62
+ | Window Length | 1024 |
63
+ | Hop Length | 256 |
64
+ | FTT size | 1024 |
65
+ | Num Mels | 80 |
66
+ | Phonemizer | espeak |
67
+ | Phoneme Lenguage | ca |
68
+ | Text Cleaners | multilingual_cleaners |
69
+ | Formatter | vctk_old |
70
+ | Optimizer | adam |
71
+ | Adam betas | (0.8, 0.99) |
72
+ | Adam eps | 1e-09 |
73
+ | Adam weight decay | 0.01 |
74
+ | Learning Rate Gen | 0.0001 |
75
+ | Lr. schedurer Gen | ExponentialLR |
76
+ | Lr. schedurer Gamma Gen | 0.999875 |
77
+ | Learning Rate Disc | 0.0001 |
78
+ | Lr. schedurer Disc | ExponentialLR |
79
+ | Lr. schedurer Gamma Disc | 0.999875 |
80
+
81
+ The model was trained for 730962 steps.
82
+
83
+ ## Additional information
84
+
85
+ ### Author
86
+ Text Mining Unit (TeMU) at the Barcelona Supercomputing Center ([email protected])
87
+
88
+ ### Contact information
89
+ For further information, send an email to [email protected]
90
+
91
+ ### Copyright
92
+ Copyright (c) 2022 Text Mining Unit at Barcelona Supercomputing Center
93
+
94
+
95
+ ### Licensing Information
96
+ [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)
97
+
98
+ ### Funding
99
+ This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
100
+
101
+
102
+ ## Disclaimer
103
+ <details>
104
+ <summary>Click to expand</summary>
105
+
106
+ The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.
107
+
108
+ When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.
109
+
110
+ In no event shall the owner and creator of the models (BSC – Barcelona Supercomputing Center) be liable for any results arising from the use made by third parties of these models.
data_processing/README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Data preparation
2
+
3
+ Scripts to process [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/) datasets, to make them compatible with training of modern TTS architectures
4
+
5
+ ## Requirements
6
+ `sox`, `ffmpeg`
7
+
8
+ ### Processing steps
9
+
10
+ #### Downloads
11
+ Download [festcat](http://festcat.talp.cat/devel.php) and [google_tts](http://openslr.org/69/)
12
+
13
+ #### Variables definition
14
+
15
+ Open the shell script `.../data_processing/process_data.sh` and modify the following fields:
16
+
17
+ ```bash
18
+ ### Festcat variables ###
19
+ export PATH_TO_FESTCAT_SHELL='.../data_processing/festcat_processing_test.sh' # Absolute path to festcat_processing_test.sh script
20
+ export PATH_TO_FESTCAT_PY='.../data_processing/extract_festcat.py' # Absolute path to extract_festcat.py script
21
+ export PATH_TO_FESTCAT_DATA='.../festcat/' # Path to Festcat dataset
22
+ export FESTCAT_FINAL_PATH='.../festcat_processed' # Path where preprocessed Festcat will be stored
23
+
24
+ ### Google_tts variables ###
25
+ export PATH_TO_GOOGLE_TTS_SHELL='.../data_processing/google_tts_processing_test.sh' # Absolute path to google_tts_processing_test.sh script
26
+ export PATH_TO_GOOGLE_TTS_PY='.../data_processing/extract_google_tts.py' # Absolute path to extract_google_tts.py script
27
+ export PATH_TO_GOOGLE_TTS_DATA='.../google_tts' # Path to Google TTS dataset
28
+ export GOOGLE_TTS_FINAL_PATH='.../google_tts_processed' # Path where preprocessed Google TTS will be stored
29
+
30
+ ### General variables ###
31
+ export VCTK_FORMATER_PATH='.../data_processing/ca_multi2vckt.py' # Absolute path to ca_multi2vckt.py script
32
+ export FINAL_PATH='.../multispeaker_ca_test/' # Path where preprocessed and vctk formatted datasets will be stored.
33
+ ```
34
+ #### Run preprocessing
35
+
36
+ Once the variables are correctly defined, execute the following command in the terminal:
37
+
38
+ `sh <...>/data_processing/process_data.sh`
39
+
40
+ The processed data in vctk format will be in the directory defined in `export FINAL_PATH='.../multispeaker_ca_test/'`.
data_processing/ca_multi2vckt.py ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import argparse
4
+ from glob import glob
5
+ from pathlib import Path
6
+ from subprocess import call
7
+
8
+ def main():
9
+ my_parser = argparse.ArgumentParser()
10
+ my_parser.add_argument('--google-path',
11
+ metavar='path',
12
+ type=str,
13
+ help='the path to tsv file')
14
+ my_parser.add_argument('--festcat-path',
15
+ metavar='path',
16
+ type=str,
17
+ help='the path to wavs file')
18
+ #my_parser.add_argument('--cv-path',
19
+ # metavar='path',
20
+ # type=str,
21
+ # help='the path to wavs file')
22
+ my_parser.add_argument('--final-path',
23
+ metavar='path',
24
+ type=str,
25
+ help='the path to wavs file')
26
+ args = my_parser.parse_args()
27
+ google_path = args.google_path
28
+ festcat_path = args.festcat_path
29
+ #common_voice_path = args.cv_path
30
+ target_base_path = args.final_path
31
+
32
+ google_tts_male = google_path + "/male/"
33
+ google_tts_female = google_path + "/female/"
34
+ google_tts_paths = [google_tts_male, google_tts_female]
35
+
36
+ #google_tts_paths = ["/gpfs/scratch/bsc88/bsc88858/google_tts/male/","/gpfs/scratch/bsc88/bsc88858/google_tts/female/"]
37
+ #festcat_path = "/gpfs/scratch/bsc88/bsc88858/festcat/"
38
+ #common_voice_path = "/gpfs/scratch/bsc88/bsc88858/cv-corpus-9.0-2022-04-27/ca/"
39
+ #target_base_path = "/gpfs/scratch/bsc88/bsc88474/data/multispeaker_ca/"
40
+
41
+ if os.path.exists(google_path):
42
+ print("Converting google_tts data to vctk format")
43
+ convert_google(google_tts_paths, target_base_path)
44
+ else:
45
+ print("Google_tts processed data not found")
46
+
47
+ if os.path.exists(festcat_path):
48
+ print("Converting festcat data to vctk format")
49
+ convert_festcat(festcat_path, target_base_path)
50
+ else:
51
+ print("Festcat processed data not found")
52
+
53
+ #convert_cv(common_voice_path, target_base_path)
54
+
55
+ def convert_google(google_tts_paths, target_base_path):
56
+ for g_path in google_tts_paths[:1]:
57
+ meta_files = glob(f"{g_path}/*_*.txt")
58
+ for meta_file in meta_files:
59
+ print(meta_file)
60
+ for line in open(meta_file).readlines():
61
+ text_id, text = line.strip().split('|')
62
+ text.replace('¿','')
63
+ text.replace('¡','')
64
+ #speaker_id = '_'.join(text_id.split('_')[:2])
65
+ speaker_id = text_id.split('_')[1]
66
+ target_text_file = os.path.join(target_base_path, 'txt',
67
+ speaker_id, text_id+'.txt')
68
+ target_wav_file = os.path.join(target_base_path, 'wav',
69
+ speaker_id, text_id+'.wav')
70
+ source_wav_file = os.path.join(g_path, 'wavs', text_id+'.wav')
71
+
72
+ speaker_paths = [os.path.dirname(target_text_file),
73
+ os.path.dirname(target_wav_file)]
74
+
75
+ convert_meta(target_text_file, target_wav_file,
76
+ source_wav_file, speaker_paths, text)
77
+
78
+ def convert_meta(target_text_file,
79
+ target_wav_file,
80
+ source_wav_file,
81
+ speaker_paths, text):
82
+
83
+ # create directories
84
+ for speaker_path in speaker_paths:
85
+ if not os.path.isdir(speaker_path):
86
+ os.mkdir(speaker_path)
87
+
88
+ # write text file
89
+ with open(target_text_file, 'w') as out:
90
+ out.write(text)
91
+
92
+ # copy wav file
93
+ try:
94
+ os.path.isfile(source_wav_file)
95
+ except:
96
+ raise IOError('{} does not exist'.format(source_wav_file))
97
+
98
+ cp_args = ['cp', source_wav_file, target_wav_file]
99
+ if not os.path.isfile(target_wav_file):
100
+ #print(' '.join(cp_args))
101
+ call(cp_args)
102
+
103
+ def convert_festcat(festcat_path, target_base_path):
104
+ meta_files = glob(f"{festcat_path}/*/*_train.txt")
105
+ for meta_file in meta_files:
106
+ speaker_name = meta_file.split(os.sep)[-2]
107
+ print(meta_file)
108
+ for line in open(meta_file).readlines():
109
+ if '[' not in line:
110
+ text_id, text = line.strip().split('|')
111
+ text.replace('¿','')
112
+ text.replace('¡','')
113
+ #speaker_id = '_'.join(text_id.split('_')[:3])
114
+ speaker_id = speaker_name
115
+ target_text_file = os.path.join(target_base_path, 'txt',
116
+ speaker_id, text_id+'.txt')
117
+ target_wav_file = os.path.join(target_base_path, 'wav',
118
+ speaker_id, text_id+'.wav')
119
+ source_wav_file = os.path.join(festcat_path, speaker_name,
120
+ 'wavs', text_id+'.wav')
121
+
122
+ speaker_paths = [os.path.dirname(target_text_file),
123
+ os.path.dirname(target_wav_file)]
124
+
125
+ convert_meta(target_text_file, target_wav_file,
126
+ source_wav_file, speaker_paths, text)
127
+ else:
128
+ print('line: {} skipped'.format(line))
129
+
130
+ def convert_cv(common_voice_path, target_base_path):
131
+ meta_files = glob(f"{common_voice_path}/*.txt")
132
+ for meta_file in meta_files:
133
+ print(meta_file)
134
+ speaker_id = meta_file.split(os.sep)[-1].replace("ca_","").replace(".txt","")
135
+ for line in open(meta_file).readlines():
136
+ text_id, text = line.strip().split('|')
137
+
138
+ target_text_file = os.path.join(target_base_path, 'txt',
139
+ speaker_id, text_id+'.txt')
140
+ target_wav_file = os.path.join(target_base_path, 'wav',
141
+ speaker_id, text_id+'.wav')
142
+ source_wav_file = os.path.join(common_voice_path,
143
+ 'wavs', text_id+'.wav')
144
+
145
+ speaker_paths = [os.path.dirname(target_text_file),
146
+ os.path.dirname(target_wav_file)]
147
+
148
+ convert_meta(target_text_file, target_wav_file,
149
+ source_wav_file, speaker_paths, text)
150
+
151
+ if __name__ == "__main__":
152
+ main()
data_processing/extract_festcat.py ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import json
4
+ import subprocess
5
+ import argparse
6
+ import logging
7
+
8
+ logger = logging.getLogger(__name__)
9
+
10
+ def main():
11
+ my_parser = argparse.ArgumentParser()
12
+ my_parser.add_argument('--utterance-path',
13
+ metavar='path',
14
+ type=str,
15
+ help='the path to utterance file')
16
+ my_parser.add_argument('--wavs-path',
17
+ metavar='path',
18
+ type=str,
19
+ help='the path to wavs file')
20
+ my_parser.add_argument('--locutors',
21
+ metavar='N',
22
+ type=str,
23
+ help='list of speakers names/id separated with commas')
24
+ args = my_parser.parse_args()
25
+ locutors = args.locutors
26
+ locutors = locutors.replace(" ", "");
27
+ locutors = locutors.split(",")
28
+ utterance_path = args.utterance_path
29
+ wavs_path = args.wavs_path
30
+
31
+ for locutor in locutors:
32
+ # get durations
33
+ durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
34
+ aggregate_duration = 0
35
+ rejected_duration = 0
36
+ large_duration = 0
37
+ total_duration = 0
38
+ path = 'upc_ca_%s_utt/utt'%locutor
39
+ path = utterance_path + path
40
+
41
+ files = []
42
+ long_files = []
43
+ for filename in os.listdir(path):
44
+ sentence = get_sentence(os.path.join(path, filename))
45
+ audio_filename = filename.replace('.utt','.wav') # upc_ca_pep_203479.wav
46
+ if sentence:
47
+ target_path = 'upc_ca_%s_wav_22k_sil_pad'%locutor
48
+ target_path = wavs_path + target_path
49
+ source_filename = 'upc_ca_%s_wav_22k_sil/'%locutor+audio_filename
50
+ source_filename = wavs_path + source_filename
51
+ total_duration += durations[audio_filename]
52
+
53
+ if os.path.isfile(source_filename):
54
+ if durations[audio_filename] < 10.0:
55
+ aggregate_duration += durations[audio_filename]
56
+ files.append((os.path.join(target_path,audio_filename), sentence))
57
+ #subprocess.call(['cp',source_filename, target_filename])
58
+ else:
59
+ long_files.append((audio_filename, sentence))
60
+ large_duration += durations[audio_filename]
61
+ else:
62
+ print(audio_filename)
63
+ else:
64
+ rejected_duration += durations[audio_filename]
65
+ out(args, locutor, files)
66
+ out_long(args, locutor, long_files)
67
+ out_long_json(args, locutor, long_files)
68
+ print(locutor, aggregate_duration/3600, 'hours')
69
+ print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
70
+ print(locutor, 'rejected', rejected_duration/60, 'minutes')
71
+ print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
72
+
73
+ def get_durations_dict(filename):
74
+ durations = {}
75
+
76
+ for line in open(filename).readlines():
77
+ d = line.split(',')
78
+ durations[d[0].split('/')[-1]] = float(d[1])
79
+ return durations
80
+
81
+ def get_sentence(filename):
82
+ utt_all = open(filename, encoding = "ISO-8859-1").read()
83
+ m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
84
+ sentence = m.groups()[1]
85
+ # delete interword dashes
86
+ sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
87
+ if not re.search('\d', sentence):
88
+ return sentence
89
+ else:
90
+ #print(filename, sentence)
91
+ return None
92
+
93
+ def out(args, locutor, files):
94
+
95
+ outname_length = [('upc_%s_test.txt'%locutor,0),
96
+ ('upc_%s_val.txt'%locutor,0),
97
+ ('upc_%s_train.txt'%locutor,len(files))]
98
+ l_sum = sum([el[1] for el in outname_length])
99
+ if len(files) != l_sum:
100
+ msg = 'train vs test val distribution wrong: %i'%l_sum
101
+ raise ValueError('msg')
102
+
103
+ for fout, l in outname_length:
104
+ open((args.wavs_path + fout), mode= 'a').close()
105
+ logger.warning(f"fout: {fout}")
106
+ logger.warning(f"l: {l}")
107
+ logger.warning(f"Enable l: {len(files)-100}")
108
+ logger.warning(f"Files: {files}")
109
+ with open((args.wavs_path + fout), 'w') as out:
110
+ for i in range(l):
111
+ f, sentence = files.pop()
112
+ out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
113
+
114
+ def out_long(args, locutor, files):
115
+ outname = '%s_longsentences.csv'%locutor
116
+ outname_path = args.wavs_path + outname
117
+ open(outname_path, mode= 'a').close()
118
+ with open(outname_path, 'w') as out:
119
+ for audio, text in files:
120
+ out.write('%s,"%s"\n'%(audio, text))
121
+
122
+ def out_long_json(args, locutor, files):
123
+ outname = '%s_longsentences.json'%locutor
124
+ source = args.wavs_path +'upc_ca_%s_wav_22k_sil/'%locutor
125
+ outname_path = args.wavs_path + outname
126
+ open(outname_path, mode= 'a').close()
127
+ interventions = []
128
+ for audio, text in files:
129
+ intervention = {}
130
+ intervention['text'] = [(locutor, text)]
131
+ intervention['urls'] = [(locutor, os.path.join(source,audio))]
132
+ interventions.append(intervention)
133
+
134
+ with open(outname_path, 'w') as out:
135
+ json.dump({'session': interventions}, out, indent=2)
136
+
137
+ if __name__ == "__main__":
138
+ main()
139
+
data_processing/extract_google_tts.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ import json
4
+ import argparse
5
+ import logging
6
+ import csv
7
+ import numpy as np
8
+
9
+ logger = logging.getLogger(__name__)
10
+
11
+ def main():
12
+ my_parser = argparse.ArgumentParser()
13
+ my_parser.add_argument('--tsv-path',
14
+ metavar='path',
15
+ type=str,
16
+ help='the path to tsv file')
17
+ my_parser.add_argument('--wavs-path',
18
+ metavar='path',
19
+ type=str,
20
+ help='the path to wavs file')
21
+ my_parser.add_argument('--locutors',
22
+ metavar='N',
23
+ type=str,
24
+ help='list of speakers names/id separated with commas')
25
+ args = my_parser.parse_args()
26
+ locutors = args.locutors
27
+ locutors = locutors.replace(" ", "");
28
+ locutors = locutors.split(",")
29
+ tsv_path = args.tsv_path
30
+ wavs_path = args.wavs_path
31
+
32
+ for locutor in locutors:
33
+ # get durations
34
+ durations = get_durations_dict(wavs_path + '%s_sil_stats.csv'%locutor)
35
+ aggregate_duration = 0
36
+ rejected_duration = 0
37
+ large_duration = 0
38
+ total_duration = 0
39
+ tsv_name = "line_index_%s.tsv"%locutor
40
+ tsv_path = tsv_path + tsv_name
41
+
42
+ tsv_file = open(tsv_path)
43
+ read_tsv = csv.reader(tsv_file, delimiter="\t")
44
+ files = []
45
+ long_files = []
46
+ for row in read_tsv:
47
+ audio_filename = row[0] + ".wav"
48
+ #logger.warning(f"Audio_filename {audio_filename}")
49
+ sentence = row[-1]
50
+ if sentence:
51
+ target_path = 'ca_es_%s_22k_sil_pad'%locutor
52
+ target_path = wavs_path + target_path
53
+ source_filename = 'ca_es_%s_22k_sil/'%locutor+audio_filename ###
54
+ source_filename = wavs_path + source_filename
55
+ #logger.warning(f"source_filename {source_filename}")
56
+ total_duration += durations[audio_filename]
57
+ if os.path.isfile(source_filename):
58
+ if durations[audio_filename] < 10.0:
59
+ aggregate_duration += durations[audio_filename]
60
+ files.append((os.path.join(target_path,audio_filename), sentence))
61
+ #subprocess.call(['cp',source_filename, target_filename])
62
+ else:
63
+ long_files.append((audio_filename, sentence))
64
+ large_duration += durations[audio_filename]
65
+ else:
66
+ print(audio_filename)
67
+ else:
68
+ rejected_duration += durations[audio_filename]
69
+
70
+ speakers_id = find_speakers_id(wavs_path + '%s_sil_stats.csv'%locutor)
71
+ for id in speakers_id:
72
+ speaker_file = files_spliter(files = files, speaker_id = id)
73
+ if len(speaker_file) == 0:
74
+ continue
75
+ else:
76
+ out(args, speaker_id = id, files = speaker_file)
77
+ #print(f"mv {wavs_path}ca_{id}_test.txt {wavs_path}{locutor}")
78
+ #os.system(f"mv {wavs_path}ca_{id}_test.txt {wavs_path}{locutor}")
79
+ #os.system(f"mv {wavs_path}ca_{id}_val.txt {wavs_path}{locutor}")
80
+ #os.system(f"mv {wavs_path}ca_{id}_train.txt {wavs_path}{locutor}")
81
+ #out(args, locutor, files)
82
+ out_long(args, locutor, long_files)
83
+ out_long_json(args, locutor, long_files)
84
+ print(locutor, aggregate_duration/3600, 'hours')
85
+ print(locutor, 'rejected due to duration', large_duration/3600, 'hours')
86
+ print(locutor, 'rejected', rejected_duration/60, 'minutes')
87
+ print(locutor, total_duration, aggregate_duration+rejected_duration+large_duration)
88
+
89
+ def get_durations_dict(filename):
90
+ durations = {}
91
+ for line in open(filename).readlines():
92
+ d = line.split(',')
93
+ durations[d[0].split('/')[-1]] = float(d[1])
94
+ return durations
95
+
96
+ def get_sentence(filename):
97
+ utt_all = open(filename, encoding = "ISO-8859-1").read()
98
+ m = re.search('(\"\\\\\")(.+)(\\\\\"\")', utt_all)
99
+ sentence = m.groups()[1]
100
+ # delete interword dashes
101
+ sentence = re.sub('-(?=([A-Z]))', ' ', sentence)
102
+ if not re.search('\d', sentence):
103
+ return sentence
104
+ else:
105
+ print(filename, sentence)
106
+ return None
107
+
108
+ def out(args, speaker_id, files):
109
+ outname_length = [('ca_%s_test.txt'%speaker_id,0),
110
+ ('ca_%s_val.txt'%speaker_id,0),
111
+ ('ca_%s_train.txt'%speaker_id,len(files))]
112
+ l_sum = sum([el[1] for el in outname_length])
113
+ if len(files) != l_sum:
114
+ msg = 'train vs test val distribution wrong: %i'%l_sum
115
+ raise ValueError('msg')
116
+
117
+ for fout, l in outname_length:
118
+ open((args.wavs_path + fout), mode= 'a').close()
119
+ with open((args.wavs_path + fout), 'w') as out:
120
+ for i in range(l):
121
+ f, sentence = files.pop()
122
+ out.write('%s|%s\n'%(f.split("/")[-1].split(".")[-2],sentence))
123
+ print(len(files))
124
+
125
+ def out_long(args, locutor, files):
126
+ outname = '%s_longsentences.csv'%locutor
127
+ outname_path = args.wavs_path + outname
128
+ open(outname_path, mode= 'a').close()
129
+ with open(outname_path, 'w') as out:
130
+ for audio, text in files:
131
+ out.write('%s,"%s"\n'%(audio, text))
132
+
133
+ def out_long_json(args, locutor, files):
134
+ outname = '%s_longsentences.json'%locutor
135
+ source = args.wavs_path +'ca_es_%s_22k_sil/'%locutor
136
+ outname_path = args.wavs_path + outname
137
+ open(outname_path, mode= 'a').close()
138
+ interventions = []
139
+ for audio, text in files:
140
+ intervention = {}
141
+ intervention['text'] = [(locutor, text)]
142
+ intervention['urls'] = [(locutor, os.path.join(source,audio))]
143
+ interventions.append(intervention)
144
+
145
+ with open(outname_path, 'w') as out:
146
+ json.dump({'session': interventions}, out, indent=2)
147
+
148
+ def find_speakers_id(path_tsv):
149
+ durations = {}
150
+ for line in open(path_tsv).readlines():
151
+ d = line.split(',')
152
+ durations[d[0].split('/')[-1]] = float(d[1])
153
+ keysList = list(durations.keys())
154
+ for index in range(len(keysList)):
155
+ keysList[index] = keysList[index].split("_")[1]
156
+ keysList = np.ndarray.tolist(np.unique(np.array(keysList)))
157
+ return keysList
158
+
159
+ def files_spliter(files, speaker_id):
160
+ out_file = []
161
+ for element in files:
162
+ if element[0].split("/")[-1].split("_")[1] == speaker_id:
163
+ out_file.append(element)
164
+ return out_file
165
+
166
+ if __name__ == "__main__":
167
+ main()
168
+
data_processing/festcat_processing_test.sh ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/sh
2
+
3
+
4
+ export FINAL_PATH=$1
5
+ export SOURCE_PATH=$2
6
+ export EXTRACT_PATH=$3
7
+
8
+
9
+ module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
10
+
11
+ for name in bet eli eva jan mar ona pau pep pol teo uri
12
+ do
13
+ echo "Processing $name data"
14
+ export SPEAKER_NAME=$name
15
+ export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
16
+ export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
17
+
18
+ if [ -d "${FINAL_PATH}" ]; then
19
+ ### Take action if $DIR exists ###
20
+ echo "Path ${FINAL_PATH} already created"
21
+ else
22
+ ### Control will jump here if $DIR does NOT exists ###
23
+ mkdir ${FINAL_PATH}
24
+ echo "Crating: ${FINAL_PATH} "
25
+ fi
26
+
27
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
28
+ ### Take action if $DIR exists ###
29
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
30
+ else
31
+ ### Control will jump here if $DIR does NOT exists ###
32
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}
33
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
34
+ fi
35
+
36
+
37
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav" ]; then
38
+ ### Take action if $DIR exists ###
39
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav already created"
40
+ else
41
+ ### Control will jump here if $DIR does NOT exists ###
42
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
43
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav "
44
+ fi
45
+
46
+
47
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/)" ]; then
48
+ i=1
49
+ sp="/-\|"
50
+ for f in ${SOURCE_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_raw/recordings/*.raw; do
51
+ t=${f%.raw}.wav; g=${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/${t##*/}; sox -t raw -r 48k -e signed -b 16 -c 1 $f $g;
52
+ printf "\r Converiting .raw audios to .wav ${sp:i++%${#sp}:1}"
53
+ sleep 0.05
54
+ done
55
+ else
56
+ echo "Already converted to .wav"
57
+ fi
58
+
59
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k" ]; then
60
+ ### Take action if $DIR exists ###
61
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k already created"
62
+ else
63
+ ### Control will jump here if $DIR does NOT exists ###
64
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
65
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k "
66
+ fi
67
+
68
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/)" ]; then
69
+ i=1
70
+ sp="/-\|"
71
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav/*.wav; do
72
+ t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/$t -v error < /dev/null;
73
+ printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
74
+ sleep 0.05
75
+ done;
76
+ else
77
+ echo "Already converted to 22kHz file"
78
+ fi
79
+
80
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil" ]; then
81
+ ### Take action if $DIR exists ###
82
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil already created"
83
+ else
84
+ ### Control will jump here if $DIR does NOT exists ###
85
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
86
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil "
87
+ fi
88
+
89
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/)" ]; then
90
+ i=1
91
+ sp="/-\|"
92
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k/*.wav; do
93
+ t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
94
+ printf "\r Filtering silence ${sp:i++%${#sp}:1}"
95
+ sleep 0.05
96
+ done
97
+ else
98
+ echo "Silence already eliminated"
99
+ fi
100
+
101
+ if [ -f "${OUTPUT_CSV}" ]; then
102
+ ### Take action if $DIR exists ###
103
+ echo "${OUTPUT_CSV} already exists!"
104
+ else
105
+ ### Control will jump here if $DIR does NOT exists ###
106
+ echo "Crating ${OUTPUT_CSV}"
107
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
108
+ d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
109
+ echo $f,$d;
110
+ done >> ${OUTPUT_CSV}
111
+ fi
112
+
113
+ if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/upc_${SPEAKER_NAME}_train.txt" ]; then
114
+ ### Take action if $DIR exists ###
115
+ echo "Splits already created!"
116
+ else
117
+ ### Control will jump here if $DIR does NOT exists ###
118
+ echo "Crating splits..."
119
+ python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --utterance-path ${UTTERANCE_PATH} --locutors ${SPEAKER_NAME}
120
+ fi
121
+
122
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad" ]; then
123
+ ### Take action if $DIR exists ###
124
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil_pad already created"
125
+ else
126
+ ### Control will jump here if $DIR does NOT exists ###
127
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
128
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
129
+ fi
130
+
131
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
132
+ i=1
133
+ sp="/-\|"
134
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil/*.wav; do
135
+ t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
136
+ printf "\r Adding pad ${sp:i++%${#sp}:1}"
137
+ sleep 0.05
138
+ done
139
+ else
140
+ echo "Pad already added!"
141
+ fi
142
+
143
+ rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k_sil
144
+ rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav_22k
145
+ rm -r ${FINAL_PATH}/${SPEAKER_NAME}/upc_ca_${SPEAKER_NAME}_wav
146
+
147
+ done
148
+ echo "Done!"
149
+
150
+
151
+
152
+
data_processing/google_tts_processing_test.sh ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/sh
2
+
3
+
4
+ export FINAL_PATH=$1
5
+ export SOURCE_PATH=$2
6
+ export EXTRACT_PATH=$3
7
+
8
+
9
+
10
+ module load gcc/8.3.0 cuda/10.2 cudnn/7.6.4 nccl/2.4.8 tensorrt/6.0.1 openmpi/4.0.1 atlas scalapack/2.0.2 fftw/3.3.8 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 szip/2.1.1 ffmpeg/4.2.1 opencv/4.1.1 python/3.7.4_ML torch/1.9.0a0 fairseq/2021-10-04 llvm/10.0.1 mecab/0.996
11
+
12
+ for name in male female
13
+ do
14
+ export SPEAKER_NAME=$name
15
+ export OUTPUT_CSV="${FINAL_PATH}/${SPEAKER_NAME}/${SPEAKER_NAME}_sil_stats.csv"
16
+ export UTTERANCE_PATH="${SOURCE_PATH}/${SPEAKER_NAME}/"
17
+
18
+ if [ -d "${FINAL_PATH}" ]; then
19
+ ### Take action if $DIR exists ###
20
+ echo "Path ${FINAL_PATH} already created"
21
+ else
22
+ ### Control will jump here if $DIR does NOT exists ###
23
+ mkdir ${FINAL_PATH}
24
+ echo "Crating: ${FINAL_PATH} "
25
+ fi
26
+
27
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}" ]; then
28
+ ### Take action if $DIR exists ###
29
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME} already created"
30
+ else
31
+ ### Control will jump here if $DIR does NOT exists ###
32
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}
33
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME} "
34
+ fi
35
+
36
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k" ]; then
37
+ ### Take action if $DIR exists ###
38
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k already created"
39
+ else
40
+ ### Control will jump here if $DIR does NOT exists ###
41
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
42
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k "
43
+ fi
44
+
45
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/)" ]; then
46
+ i=1
47
+ sp="/-\|"
48
+ for f in ${SOURCE_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}/*.wav; do
49
+ t=${f##*/}; ffmpeg -i $f -ar 22050 ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/$t -v error < /dev/null;
50
+ printf "\r Converiting audios of 48kHz to 22kHz ${sp:i++%${#sp}:1}"
51
+ sleep 0.05
52
+ done;
53
+ else
54
+ echo "Already converted to 22kHz file"
55
+ fi
56
+
57
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil" ]; then
58
+ ### Take action if $DIR exists ###
59
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil already created"
60
+ else
61
+ ### Control will jump here if $DIR does NOT exists ###
62
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
63
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil "
64
+ fi
65
+
66
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/)" ]; then
67
+ i=1
68
+ sp="/-\|"
69
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k/*.wav; do
70
+ t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/$t silence 1 0.02 0.5% reverse silence 1 0.02 0.5% reverse;
71
+ printf "\r Filtering silence ${sp:i++%${#sp}:1}"
72
+ sleep 0.05
73
+ done
74
+ else
75
+ echo "Silence has already been filtered!"
76
+ fi
77
+
78
+ if [ -f "${OUTPUT_CSV}" ]; then
79
+ ### Take action if $DIR exists ###
80
+ echo "${OUTPUT_CSV} already exists!"
81
+ else
82
+ ### Control will jump here if $DIR does NOT exists ###
83
+ echo "Crating ${OUTPUT_CSV}"
84
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
85
+ d=`ffprobe -i $f -show_entries format=duration -v quiet -of csv="p=0"`;
86
+ echo $f,$d;
87
+ done >> ${OUTPUT_CSV}
88
+ fi
89
+
90
+ if [ -f "${FINAL_PATH}/${SPEAKER_NAME}/ca_01591_train.txt" ]; then
91
+ ### Take action if $DIR exists ###
92
+ echo "Splits already created!"
93
+ else
94
+ ### Control will jump here if $DIR does NOT exists ###
95
+ echo "Crating splits..."
96
+ python ${EXTRACT_PATH} --wavs-path ${FINAL_PATH}/${SPEAKER_NAME}/ --tsv-path ${SOURCE_PATH}/${SPEAKER_NAME}/ --locutors ${SPEAKER_NAME}
97
+ fi
98
+
99
+ if [ -d "${FINAL_PATH}/${SPEAKER_NAME}/wavs" ]; then
100
+ ### Take action if $DIR exists ###
101
+ echo "Path ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
102
+ else
103
+ ### Control will jump here if $DIR does NOT exists ###
104
+ mkdir ${FINAL_PATH}/${SPEAKER_NAME}/wavs
105
+ echo "Crating: ${FINAL_PATH}/${SPEAKER_NAME}/wavs"
106
+ fi
107
+
108
+ if [ -z "$(ls -A ${FINAL_PATH}/${SPEAKER_NAME}/wavs/)" ]; then
109
+ i=1
110
+ sp="/-\|"
111
+ for f in ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil/*.wav; do
112
+ t=${f##*/}; sox $f ${FINAL_PATH}/${SPEAKER_NAME}/wavs/$t pad 0 0.058;
113
+ printf "\r Adding pad ${sp:i++%${#sp}:1}"
114
+ sleep 0.05
115
+ done
116
+ else
117
+ echo "Pad already added!"
118
+ fi
119
+
120
+ rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k_sil
121
+ rm -r ${FINAL_PATH}/${SPEAKER_NAME}/ca_es_${SPEAKER_NAME}_22k
122
+
123
+ done
124
+ echo "Done!"
data_processing/process_data.sh ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ ### Festcat variables ###
4
+ export PATH_TO_FESTCAT_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/festcat_processing_test.sh'
5
+ export PATH_TO_FESTCAT_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_festcat.py'
6
+ export PATH_TO_FESTCAT_DATA='/gpfs/scratch/bsc88/bsc88858/festcat/'
7
+ export FESTCAT_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/festcat_processed'
8
+
9
+ ### Google_tts variables ###
10
+ export PATH_TO_GOOGLE_TTS_SHELL='/gpfs/scratch/bsc88/bsc88858/data_processing/google_tts_processing_test.sh'
11
+ export PATH_TO_GOOGLE_TTS_PY='/gpfs/scratch/bsc88/bsc88858/data_processing/extract_google_tts.py'
12
+ export PATH_TO_GOOGLE_TTS_DATA='/gpfs/scratch/bsc88/bsc88858/google_tts'
13
+ export GOOGLE_TTS_FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/google_tts_processed'
14
+
15
+ ### General variables ###
16
+ export VCTK_FORMATER_PATH='/gpfs/scratch/bsc88/bsc88858/data_processing/ca_multi2vckt.py'
17
+ export FINAL_PATH='/gpfs/scratch/bsc88/bsc88858/multispeaker_ca_test/'
18
+
19
+
20
+ if [ -d "${FESTCAT_FINAL_PATH}" ]; then
21
+ ### Take action if $DIR exists ###
22
+ echo "Path ${FESTCAT_FINAL_PATH} already exists"
23
+ else
24
+ ### Control will jump here if $DIR does NOT exists ###
25
+ if [ -d "${PATH_TO_FESTCAT_DATA}" ]; then
26
+ source ${PATH_TO_FESTCAT_SHELL} ${FESTCAT_FINAL_PATH} ${PATH_TO_FESTCAT_DATA} ${PATH_TO_FESTCAT_PY}
27
+ else
28
+ echo "Fescat data not found!"
29
+ fi
30
+ fi
31
+
32
+ if [ -d "${GOOGLE_TTS_FINAL_PATH}" ]; then
33
+ ### Take action if $DIR exists ###
34
+ echo "Path ${GOOGLE_TTS_FINAL_PATH} already exists"
35
+ else
36
+ ### Control will jump here if $DIR does NOT exists ###
37
+ if [ -d "${PATH_TO_GOOGLE_TTS_DATA}" ]; then
38
+ source ${PATH_TO_GOOGLE_TTS_SHELL} ${GOOGLE_TTS_FINAL_PATH} ${PATH_TO_GOOGLE_TTS_DATA} ${PATH_TO_GOOGLE_TTS_PY}
39
+ else
40
+ echo "Google TTS data not found!"
41
+ fi
42
+ fi
43
+
44
+ if [ -d "${FINAL_PATH}" ]; then
45
+ ### Take action if $DIR exists ###
46
+ echo "Path ${FINAL_PATH} already created"
47
+ else
48
+ ### Control will jump here if $DIR does NOT exists ###
49
+ mkdir ${FINAL_PATH}
50
+ mkdir ${FINAL_PATH}/txt/
51
+ mkdir ${FINAL_PATH}/wav/
52
+ echo "Crating: ${FINAL_PATH}"
53
+ python ${VCTK_FORMATER_PATH} --google-path ${GOOGLE_TTS_FINAL_PATH} --festcat-path ${FESTCAT_FINAL_PATH} --final-path ${FINAL_PATH}
54
+ fi
55
+
56
+ echo "Done!"
model/best_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b15fa7d2052bada1cf421e49d2d03b00e95b49fcd0e42b7af1d92da2880cdecc
3
+ size 1038659133
model/config.json ADDED
@@ -0,0 +1,262 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "output_path": "/gpfs/projects/bsc88/speech/tts/TTS_v0.8.0/recipes/multispeaker/experiments_from_previous",
3
+ "logger_uri": null,
4
+ "run_name": "multispeaker_vits_ca_1e4_1e4_32",
5
+ "project_name": null,
6
+ "run_description": "\ud83d\udc38Coqui trainer run.",
7
+ "print_step": 25,
8
+ "plot_step": 100,
9
+ "model_param_stats": false,
10
+ "wandb_entity": null,
11
+ "dashboard_logger": "tensorboard",
12
+ "log_model_step": 1000,
13
+ "save_step": 1000,
14
+ "save_n_checkpoints": 5,
15
+ "save_checkpoints": true,
16
+ "save_all_best": true,
17
+ "save_best_after": 10000,
18
+ "target_loss": null,
19
+ "print_eval": true,
20
+ "test_delay_epochs": -1,
21
+ "run_eval": true,
22
+ "run_eval_steps": null,
23
+ "distributed_backend": "nccl",
24
+ "distributed_url": "tcp://localhost:54321",
25
+ "mixed_precision": false,
26
+ "epochs": 1000,
27
+ "batch_size": 16,
28
+ "eval_batch_size": 8,
29
+ "grad_clip": [
30
+ 1000.0,
31
+ 1000.0
32
+ ],
33
+ "scheduler_after_epoch": true,
34
+ "lr": 0.001,
35
+ "optimizer": "AdamW",
36
+ "optimizer_params": {
37
+ "betas": [
38
+ 0.8,
39
+ 0.99
40
+ ],
41
+ "eps": 1e-09,
42
+ "weight_decay": 0.01
43
+ },
44
+ "lr_scheduler": "",
45
+ "lr_scheduler_params": null,
46
+ "use_grad_scaler": false,
47
+ "cudnn_enable": true,
48
+ "cudnn_deterministic": false,
49
+ "cudnn_benchmark": false,
50
+ "training_seed": 54321,
51
+ "model": "vits",
52
+ "num_loader_workers": 4,
53
+ "num_eval_loader_workers": 4,
54
+ "use_noise_augment": false,
55
+ "audio": {
56
+ "fft_size": 1024,
57
+ "sample_rate": 22050,
58
+ "win_length": 1024,
59
+ "hop_length": 256,
60
+ "num_mels": 80,
61
+ "mel_fmin": 0,
62
+ "mel_fmax": null
63
+ },
64
+ "use_phonemes": true,
65
+ "phonemizer": "espeak",
66
+ "phoneme_language": "ca",
67
+ "compute_input_seq_cache": true,
68
+ "text_cleaner": "multilingual_cleaners",
69
+ "enable_eos_bos_chars": false,
70
+ "test_sentences_file": "",
71
+ "phoneme_cache_path": "/gpfs/projects/bsc88/speech/tts/TTS_v0.8.0/recipes/multispeaker/phoneme_cache",
72
+ "characters": {
73
+ "characters_class": "TTS.tts.utils.text.characters.IPAPhonemes",
74
+ "vocab_dict": null,
75
+ "pad": "<PAD>",
76
+ "eos": "<EOS>",
77
+ "bos": "<BOS>",
78
+ "blank": "<BLNK>",
79
+ "characters": "iy\u0268\u0289\u026fu\u026a\u028f\u028ae\u00f8\u0258\u0259\u0275\u0264o\u025b\u0153\u025c\u025e\u028c\u0254\u00e6\u0250a\u0276\u0251\u0252\u1d7b\u0298\u0253\u01c0\u0257\u01c3\u0284\u01c2\u0260\u01c1\u029bpbtd\u0288\u0256c\u025fk\u0261q\u0262\u0294\u0274\u014b\u0272\u0273n\u0271m\u0299r\u0280\u2c71\u027e\u027d\u0278\u03b2fv\u03b8\u00f0sz\u0283\u0292\u0282\u0290\u00e7\u029dx\u0263\u03c7\u0281\u0127\u0295h\u0266\u026c\u026e\u028b\u0279\u027bj\u0270l\u026d\u028e\u029f\u02c8\u02cc\u02d0\u02d1\u028dw\u0265\u029c\u02a2\u02a1\u0255\u0291\u027a\u0267\u02b2\u025a\u02de\u026b",
80
+ "punctuations": "!'(),-.:;? ",
81
+ "phonemes": null,
82
+ "is_unique": false,
83
+ "is_sorted": true
84
+ },
85
+ "add_blank": true,
86
+ "batch_group_size": 5,
87
+ "loss_masking": null,
88
+ "min_audio_len": 1,
89
+ "max_audio_len": Infinity,
90
+ "min_text_len": 1,
91
+ "max_text_len": 325,
92
+ "compute_f0": false,
93
+ "compute_linear_spec": true,
94
+ "precompute_num_workers": 0,
95
+ "start_by_longest": false,
96
+ "datasets": [
97
+ {
98
+ "formatter": "vctk_old",
99
+ "dataset_name": "vctk_old",
100
+ "path": "/gpfs/scratch/bsc88/bsc88474/data/multispeaker_ca",
101
+ "meta_file_train": "",
102
+ "ignored_speakers": [
103
+ "uri",
104
+ "09796",
105
+ "05450"
106
+ ],
107
+ "language": "ca",
108
+ "meta_file_val": "",
109
+ "meta_file_attn_mask": ""
110
+ }
111
+ ],
112
+ "test_sentences": [
113
+ [
114
+ "Per exemple, dels nostres bancs que inverteixen en armament de les nostres empreses."
115
+ ],
116
+ [
117
+ "Preguntin-se si aix\u00f2 era necessari."
118
+ ],
119
+ [
120
+ "La suposada ocultaci\u00f3 dels informes que advertien de risc s\u00edsmic."
121
+ ],
122
+ [
123
+ "\u00c9s de 633 milions d'euros quan es far\u00e0 la publicaci\u00f3 detallada."
124
+ ]
125
+ ],
126
+ "eval_split_max_size": null,
127
+ "eval_split_size": 0.01,
128
+ "use_speaker_weighted_sampler": false,
129
+ "speaker_weighted_sampler_alpha": 1.0,
130
+ "use_language_weighted_sampler": false,
131
+ "language_weighted_sampler_alpha": 1.0,
132
+ "use_length_weighted_sampler": false,
133
+ "length_weighted_sampler_alpha": 1.0,
134
+ "model_args": {
135
+ "num_chars": 131,
136
+ "out_channels": 513,
137
+ "spec_segment_size": 32,
138
+ "hidden_channels": 192,
139
+ "hidden_channels_ffn_text_encoder": 768,
140
+ "num_heads_text_encoder": 2,
141
+ "num_layers_text_encoder": 6,
142
+ "kernel_size_text_encoder": 3,
143
+ "dropout_p_text_encoder": 0.1,
144
+ "dropout_p_duration_predictor": 0.5,
145
+ "kernel_size_posterior_encoder": 5,
146
+ "dilation_rate_posterior_encoder": 1,
147
+ "num_layers_posterior_encoder": 16,
148
+ "kernel_size_flow": 5,
149
+ "dilation_rate_flow": 1,
150
+ "num_layers_flow": 4,
151
+ "resblock_type_decoder": "1",
152
+ "resblock_kernel_sizes_decoder": [
153
+ 3,
154
+ 7,
155
+ 11
156
+ ],
157
+ "resblock_dilation_sizes_decoder": [
158
+ [
159
+ 1,
160
+ 3,
161
+ 5
162
+ ],
163
+ [
164
+ 1,
165
+ 3,
166
+ 5
167
+ ],
168
+ [
169
+ 1,
170
+ 3,
171
+ 5
172
+ ]
173
+ ],
174
+ "upsample_rates_decoder": [
175
+ 8,
176
+ 8,
177
+ 2,
178
+ 2
179
+ ],
180
+ "upsample_initial_channel_decoder": 512,
181
+ "upsample_kernel_sizes_decoder": [
182
+ 16,
183
+ 16,
184
+ 4,
185
+ 4
186
+ ],
187
+ "periods_multi_period_discriminator": [
188
+ 2,
189
+ 3,
190
+ 5,
191
+ 7,
192
+ 11
193
+ ],
194
+ "use_sdp": true,
195
+ "noise_scale": 1.0,
196
+ "inference_noise_scale": 0.667,
197
+ "length_scale": 1.0,
198
+ "noise_scale_dp": 1.0,
199
+ "inference_noise_scale_dp": 1.0,
200
+ "max_inference_len": null,
201
+ "init_discriminator": true,
202
+ "use_spectral_norm_disriminator": false,
203
+ "use_speaker_embedding": true,
204
+ "num_speakers": 257,
205
+ "speakers_file": "/home/user/app/speakers.pth",
206
+ "d_vector_file": null,
207
+ "speaker_embedding_channels": 256,
208
+ "use_d_vector_file": false,
209
+ "d_vector_dim": 0,
210
+ "detach_dp_input": true,
211
+ "use_language_embedding": false,
212
+ "embedded_language_dim": 4,
213
+ "num_languages": 0,
214
+ "language_ids_file": null,
215
+ "use_speaker_encoder_as_loss": false,
216
+ "speaker_encoder_config_path": "",
217
+ "speaker_encoder_model_path": "",
218
+ "condition_dp_on_speaker": true,
219
+ "freeze_encoder": false,
220
+ "freeze_DP": false,
221
+ "freeze_PE": false,
222
+ "freeze_flow_decoder": false,
223
+ "freeze_waveform_decoder": false,
224
+ "encoder_sample_rate": null,
225
+ "interpolate_z": true,
226
+ "reinit_DP": false,
227
+ "reinit_text_encoder": false
228
+ },
229
+ "lr_gen": 0.0001,
230
+ "lr_disc": 0.0001,
231
+ "lr_scheduler_gen": "ExponentialLR",
232
+ "lr_scheduler_gen_params": {
233
+ "gamma": 0.999875,
234
+ "last_epoch": -1
235
+ },
236
+ "lr_scheduler_disc": "ExponentialLR",
237
+ "lr_scheduler_disc_params": {
238
+ "gamma": 0.999875,
239
+ "last_epoch": -1
240
+ },
241
+ "kl_loss_alpha": 1.0,
242
+ "disc_loss_alpha": 1.0,
243
+ "gen_loss_alpha": 1.0,
244
+ "feat_loss_alpha": 1.0,
245
+ "mel_loss_alpha": 45.0,
246
+ "dur_loss_alpha": 1.0,
247
+ "speaker_encoder_loss_alpha": 1.0,
248
+ "return_wav": true,
249
+ "use_weighted_sampler": false,
250
+ "weighted_sampler_attrs": null,
251
+ "weighted_sampler_multipliers": null,
252
+ "r": 1,
253
+ "num_speakers": 257,
254
+ "use_speaker_embedding": true,
255
+ "speakers_file": "/home/user/app/speakers.pth",
256
+ "speaker_embedding_channels": 256,
257
+ "language_ids_file": null,
258
+ "use_language_embedding": false,
259
+ "use_d_vector_file": false,
260
+ "d_vector_file": null,
261
+ "d_vector_dim": 0
262
+ }
model/speakers.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6dacda0b8dd3e111c5072f8f33c08b4a29b92ac79aaf22ceca912d01e7deb905
3
+ size 30191