ESPnet
Chinese
audio
singing-voice-synthesis
ftshijt commited on
Commit
e3083d8
·
1 Parent(s): cd2f4ea

Update model

Browse files
README.md CHANGED
@@ -1,3 +1,372 @@
1
  ---
2
- license: apache-2.0
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ tags:
3
+ - espnet
4
+ - audio
5
+ - singing-voice-synthesis
6
+ language: zh
7
+ datasets:
8
+ - opencpop
9
+ license: cc-by-4.0
10
  ---
11
+
12
+ ## ESPnet2 SVS model
13
+
14
+ ### `espnet/opencpop_naive_rnn_dp`
15
+
16
+ This model was trained by ftshijt using opencpop recipe in [espnet](https://github.com/espnet/espnet/).
17
+
18
+ ### Demo: How to use in ESPnet2
19
+
20
+ Follow the [ESPnet installation instructions](https://espnet.github.io/espnet/installation.html)
21
+ if you haven't done that already.
22
+
23
+ ```bash
24
+ cd espnet
25
+ git checkout 5c4d7cf7feba8461de2e1080bf82182f0efaef38
26
+ pip install -e .
27
+ cd egs2/opencpop/svs1
28
+ ./run.sh --skip_data_prep false --skip_train true --download_model espnet/opencpop_naive_rnn_dp
29
+ ```
30
+
31
+
32
+
33
+ ## SVS config
34
+
35
+ <details><summary>expand</summary>
36
+
37
+ ```
38
+ config: conf/tuning/train_naive_rnn_dp.yaml
39
+ print_config: false
40
+ log_level: INFO
41
+ drop_last_iter: false
42
+ dry_run: false
43
+ iterator_type: sequence
44
+ valid_iterator_type: null
45
+ output_dir: exp/svs_train_naive_rnn_dp_raw_phn_None_zh
46
+ ngpu: 1
47
+ seed: 0
48
+ num_workers: 8
49
+ num_att_plot: 3
50
+ dist_backend: nccl
51
+ dist_init_method: env://
52
+ dist_world_size: null
53
+ dist_rank: null
54
+ local_rank: 0
55
+ dist_master_addr: null
56
+ dist_master_port: null
57
+ dist_launcher: null
58
+ multiprocessing_distributed: false
59
+ unused_parameters: false
60
+ sharded_ddp: false
61
+ cudnn_enabled: true
62
+ cudnn_benchmark: false
63
+ cudnn_deterministic: true
64
+ collect_stats: false
65
+ write_collected_feats: false
66
+ max_epoch: 500
67
+ patience: null
68
+ val_scheduler_criterion:
69
+ - valid
70
+ - loss
71
+ early_stopping_criterion:
72
+ - valid
73
+ - loss
74
+ - min
75
+ best_model_criterion:
76
+ - - valid
77
+ - loss
78
+ - min
79
+ - - train
80
+ - loss
81
+ - min
82
+ keep_nbest_models: 2
83
+ nbest_averaging_interval: 0
84
+ grad_clip: 1.0
85
+ grad_clip_type: 2.0
86
+ grad_noise: false
87
+ accum_grad: 1
88
+ no_forward_run: false
89
+ resume: true
90
+ train_dtype: float32
91
+ use_amp: false
92
+ log_interval: null
93
+ use_matplotlib: true
94
+ use_tensorboard: true
95
+ create_graph_in_tensorboard: false
96
+ use_wandb: false
97
+ wandb_project: null
98
+ wandb_id: null
99
+ wandb_entity: null
100
+ wandb_name: null
101
+ wandb_model_log_interval: -1
102
+ detect_anomaly: false
103
+ use_lora: false
104
+ save_lora_only: true
105
+ lora_conf: {}
106
+ pretrain_path: null
107
+ init_param: []
108
+ ignore_init_mismatch: false
109
+ freeze_param: []
110
+ num_iters_per_epoch: null
111
+ batch_size: 16
112
+ valid_batch_size: null
113
+ batch_bins: 1000000
114
+ valid_batch_bins: null
115
+ train_shape_file:
116
+ - exp/svs_stats_raw_phn_None_zh/train/text_shape.phn
117
+ - exp/svs_stats_raw_phn_None_zh/train/singing_shape
118
+ valid_shape_file:
119
+ - exp/svs_stats_raw_phn_None_zh/valid/text_shape.phn
120
+ - exp/svs_stats_raw_phn_None_zh/valid/singing_shape
121
+ batch_type: sorted
122
+ valid_batch_type: null
123
+ fold_length:
124
+ - 150
125
+ - 240000
126
+ sort_in_batch: descending
127
+ shuffle_within_batch: false
128
+ sort_batch: descending
129
+ multiple_iterator: false
130
+ chunk_length: 500
131
+ chunk_shift_ratio: 0.5
132
+ num_cache_chunks: 1024
133
+ chunk_excluded_key_prefixes: []
134
+ chunk_default_fs: null
135
+ train_data_path_and_name_and_type:
136
+ - - dump24k/raw/tr_no_dev/text
137
+ - text
138
+ - text
139
+ - - dump24k/raw/tr_no_dev/wav.scp
140
+ - singing
141
+ - sound
142
+ - - dump24k/raw/tr_no_dev/label
143
+ - label
144
+ - duration
145
+ - - dump24k/raw/tr_no_dev/score.scp
146
+ - score
147
+ - score
148
+ - - exp/svs_stats_raw_phn_None_zh/train/collect_feats/pitch.scp
149
+ - pitch
150
+ - npy
151
+ - - exp/svs_stats_raw_phn_None_zh/train/collect_feats/feats.scp
152
+ - feats
153
+ - npy
154
+ valid_data_path_and_name_and_type:
155
+ - - dump24k/raw/dev/text
156
+ - text
157
+ - text
158
+ - - dump24k/raw/dev/wav.scp
159
+ - singing
160
+ - sound
161
+ - - dump24k/raw/dev/label
162
+ - label
163
+ - duration
164
+ - - dump24k/raw/dev/score.scp
165
+ - score
166
+ - score
167
+ - - exp/svs_stats_raw_phn_None_zh/valid/collect_feats/pitch.scp
168
+ - pitch
169
+ - npy
170
+ - - exp/svs_stats_raw_phn_None_zh/valid/collect_feats/feats.scp
171
+ - feats
172
+ - npy
173
+ allow_variable_data_keys: false
174
+ max_cache_size: 0.0
175
+ max_cache_fd: 32
176
+ allow_multi_rates: false
177
+ valid_max_cache_size: null
178
+ exclude_weight_decay: false
179
+ exclude_weight_decay_conf: {}
180
+ optim: adam
181
+ optim_conf:
182
+ lr: 0.001
183
+ eps: 1.0e-06
184
+ weight_decay: 0.0
185
+ scheduler: null
186
+ scheduler_conf: {}
187
+ token_list:
188
+ - <blank>
189
+ - <unk>
190
+ - SP
191
+ - i
192
+ - AP
193
+ - e
194
+ - y
195
+ - d
196
+ - w
197
+ - sh
198
+ - ai
199
+ - n
200
+ - x
201
+ - j
202
+ - ian
203
+ - u
204
+ - l
205
+ - h
206
+ - b
207
+ - o
208
+ - zh
209
+ - an
210
+ - ou
211
+ - m
212
+ - q
213
+ - z
214
+ - en
215
+ - g
216
+ - ing
217
+ - ei
218
+ - ao
219
+ - ang
220
+ - uo
221
+ - eng
222
+ - t
223
+ - a
224
+ - ong
225
+ - ui
226
+ - k
227
+ - f
228
+ - r
229
+ - iang
230
+ - ch
231
+ - v
232
+ - in
233
+ - iao
234
+ - ie
235
+ - iu
236
+ - c
237
+ - s
238
+ - van
239
+ - p
240
+ - ve
241
+ - uan
242
+ - uang
243
+ - ia
244
+ - ua
245
+ - uai
246
+ - un
247
+ - er
248
+ - vn
249
+ - iong
250
+ - <sos/eos>
251
+ odim: null
252
+ model_conf: {}
253
+ use_preprocessor: true
254
+ token_type: phn
255
+ bpemodel: null
256
+ non_linguistic_symbols: null
257
+ cleaner: null
258
+ g2p: null
259
+ fs: 24000
260
+ score_feats_extract: syllable_score_feats
261
+ score_feats_extract_conf:
262
+ fs: 24000
263
+ n_fft: 2048
264
+ win_length: 1200
265
+ hop_length: 300
266
+ feats_extract: fbank
267
+ feats_extract_conf:
268
+ n_fft: 2048
269
+ hop_length: 300
270
+ win_length: 1200
271
+ fs: 24000
272
+ fmin: 80
273
+ fmax: 7600
274
+ n_mels: 80
275
+ normalize: global_mvn
276
+ normalize_conf:
277
+ stats_file: exp/svs_stats_raw_phn_None_zh/train/feats_stats.npz
278
+ svs: naive_rnn_dp
279
+ svs_conf:
280
+ midi_dim: 129
281
+ embed_dim: 512
282
+ duration_dim: 512
283
+ eprenet_conv_layers: 0
284
+ eprenet_conv_chans: 256
285
+ eprenet_conv_filts: 3
286
+ elayers: 3
287
+ eunits: 256
288
+ ebidirectional: true
289
+ midi_embed_integration_type: add
290
+ dlayers: 2
291
+ dunits: 256
292
+ dbidirectional: true
293
+ postnet_layers: 5
294
+ postnet_chans: 512
295
+ postnet_filts: 5
296
+ use_batch_norm: true
297
+ reduction_factor: 1
298
+ eprenet_dropout_rate: 0.2
299
+ edropout_rate: 0.1
300
+ ddropout_rate: 0.1
301
+ postnet_dropout_rate: 0.5
302
+ init_type: pytorch
303
+ use_masking: true
304
+ pitch_extract: dio
305
+ pitch_extract_conf:
306
+ use_token_averaged_f0: false
307
+ fs: 24000
308
+ n_fft: 2048
309
+ hop_length: 300
310
+ f0max: 800
311
+ f0min: 80
312
+ reduction_factor: 1
313
+ pitch_normalize: global_mvn
314
+ pitch_normalize_conf:
315
+ stats_file: exp/svs_stats_raw_phn_None_zh/train/pitch_stats.npz
316
+ ying_extract: null
317
+ ying_extract_conf: {}
318
+ energy_extract: null
319
+ energy_extract_conf: {}
320
+ energy_normalize: null
321
+ energy_normalize_conf: {}
322
+ required:
323
+ - output_dir
324
+ - token_list
325
+ version: '202310'
326
+ distributed: false
327
+ ```
328
+
329
+ </details>
330
+
331
+
332
+
333
+ ### Citing ESPnet
334
+
335
+ ```BibTex
336
+ @inproceedings{watanabe2018espnet,
337
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
338
+ title={{ESPnet}: End-to-End Speech Processing Toolkit},
339
+ year={2018},
340
+ booktitle={Proceedings of Interspeech},
341
+ pages={2207--2211},
342
+ doi={10.21437/Interspeech.2018-1456},
343
+ url={http://dx.doi.org/10.21437/Interspeech.2018-1456}
344
+ }
345
+
346
+
347
+
348
+
349
+
350
+
351
+ @inproceedings{shi22d_interspeech,
352
+ author={Jiatong Shi and Shuai Guo and Tao Qian and Tomoki Hayashi and Yuning Wu and Fangzheng Xu and Xuankai Chang and Huazhe Li and Peter Wu and Shinji Watanabe and Qin Jin},
353
+ title={{Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis}},
354
+ year=2022,
355
+ booktitle={Proc. Interspeech 2022},
356
+ pages={4277--4281},
357
+ doi={10.21437/Interspeech.2022-10039}
358
+ }
359
+ ```
360
+
361
+ or arXiv:
362
+
363
+ ```bibtex
364
+ @misc{watanabe2018espnet,
365
+ title={ESPnet: End-to-End Speech Processing Toolkit},
366
+ author={Shinji Watanabe and Takaaki Hori and Shigeki Karita and Tomoki Hayashi and Jiro Nishitoba and Yuya Unno and Nelson Yalta and Jahn Heymann and Matthew Wiesner and Nanxin Chen and Adithya Renduchintala and Tsubasa Ochiai},
367
+ year={2018},
368
+ eprint={1804.00015},
369
+ archivePrefix={arXiv},
370
+ primaryClass={cs.CL}
371
+ }
372
+ ```
exp/svs_stats_raw_phn_None_zh/train/feats_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0af3f18f8b910e800d5af007e547f233de55fea0081348958d2edb4c9681bed
3
+ size 1402
exp/svs_stats_raw_phn_None_zh/train/pitch_stats.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bfc536c55a496e833af3b67fb52de8ec9e1d9a57bd7d86344513799e725816a
3
+ size 770
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/config.yaml ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ config: conf/tuning/train_naive_rnn_dp.yaml
2
+ print_config: false
3
+ log_level: INFO
4
+ drop_last_iter: false
5
+ dry_run: false
6
+ iterator_type: sequence
7
+ valid_iterator_type: null
8
+ output_dir: exp/svs_train_naive_rnn_dp_raw_phn_None_zh
9
+ ngpu: 1
10
+ seed: 0
11
+ num_workers: 8
12
+ num_att_plot: 3
13
+ dist_backend: nccl
14
+ dist_init_method: env://
15
+ dist_world_size: null
16
+ dist_rank: null
17
+ local_rank: 0
18
+ dist_master_addr: null
19
+ dist_master_port: null
20
+ dist_launcher: null
21
+ multiprocessing_distributed: false
22
+ unused_parameters: false
23
+ sharded_ddp: false
24
+ cudnn_enabled: true
25
+ cudnn_benchmark: false
26
+ cudnn_deterministic: true
27
+ collect_stats: false
28
+ write_collected_feats: false
29
+ max_epoch: 500
30
+ patience: null
31
+ val_scheduler_criterion:
32
+ - valid
33
+ - loss
34
+ early_stopping_criterion:
35
+ - valid
36
+ - loss
37
+ - min
38
+ best_model_criterion:
39
+ - - valid
40
+ - loss
41
+ - min
42
+ - - train
43
+ - loss
44
+ - min
45
+ keep_nbest_models: 2
46
+ nbest_averaging_interval: 0
47
+ grad_clip: 1.0
48
+ grad_clip_type: 2.0
49
+ grad_noise: false
50
+ accum_grad: 1
51
+ no_forward_run: false
52
+ resume: true
53
+ train_dtype: float32
54
+ use_amp: false
55
+ log_interval: null
56
+ use_matplotlib: true
57
+ use_tensorboard: true
58
+ create_graph_in_tensorboard: false
59
+ use_wandb: false
60
+ wandb_project: null
61
+ wandb_id: null
62
+ wandb_entity: null
63
+ wandb_name: null
64
+ wandb_model_log_interval: -1
65
+ detect_anomaly: false
66
+ use_lora: false
67
+ save_lora_only: true
68
+ lora_conf: {}
69
+ pretrain_path: null
70
+ init_param: []
71
+ ignore_init_mismatch: false
72
+ freeze_param: []
73
+ num_iters_per_epoch: null
74
+ batch_size: 16
75
+ valid_batch_size: null
76
+ batch_bins: 1000000
77
+ valid_batch_bins: null
78
+ train_shape_file:
79
+ - exp/svs_stats_raw_phn_None_zh/train/text_shape.phn
80
+ - exp/svs_stats_raw_phn_None_zh/train/singing_shape
81
+ valid_shape_file:
82
+ - exp/svs_stats_raw_phn_None_zh/valid/text_shape.phn
83
+ - exp/svs_stats_raw_phn_None_zh/valid/singing_shape
84
+ batch_type: sorted
85
+ valid_batch_type: null
86
+ fold_length:
87
+ - 150
88
+ - 240000
89
+ sort_in_batch: descending
90
+ shuffle_within_batch: false
91
+ sort_batch: descending
92
+ multiple_iterator: false
93
+ chunk_length: 500
94
+ chunk_shift_ratio: 0.5
95
+ num_cache_chunks: 1024
96
+ chunk_excluded_key_prefixes: []
97
+ chunk_default_fs: null
98
+ train_data_path_and_name_and_type:
99
+ - - dump24k/raw/tr_no_dev/text
100
+ - text
101
+ - text
102
+ - - dump24k/raw/tr_no_dev/wav.scp
103
+ - singing
104
+ - sound
105
+ - - dump24k/raw/tr_no_dev/label
106
+ - label
107
+ - duration
108
+ - - dump24k/raw/tr_no_dev/score.scp
109
+ - score
110
+ - score
111
+ - - exp/svs_stats_raw_phn_None_zh/train/collect_feats/pitch.scp
112
+ - pitch
113
+ - npy
114
+ - - exp/svs_stats_raw_phn_None_zh/train/collect_feats/feats.scp
115
+ - feats
116
+ - npy
117
+ valid_data_path_and_name_and_type:
118
+ - - dump24k/raw/dev/text
119
+ - text
120
+ - text
121
+ - - dump24k/raw/dev/wav.scp
122
+ - singing
123
+ - sound
124
+ - - dump24k/raw/dev/label
125
+ - label
126
+ - duration
127
+ - - dump24k/raw/dev/score.scp
128
+ - score
129
+ - score
130
+ - - exp/svs_stats_raw_phn_None_zh/valid/collect_feats/pitch.scp
131
+ - pitch
132
+ - npy
133
+ - - exp/svs_stats_raw_phn_None_zh/valid/collect_feats/feats.scp
134
+ - feats
135
+ - npy
136
+ allow_variable_data_keys: false
137
+ max_cache_size: 0.0
138
+ max_cache_fd: 32
139
+ allow_multi_rates: false
140
+ valid_max_cache_size: null
141
+ exclude_weight_decay: false
142
+ exclude_weight_decay_conf: {}
143
+ optim: adam
144
+ optim_conf:
145
+ lr: 0.001
146
+ eps: 1.0e-06
147
+ weight_decay: 0.0
148
+ scheduler: null
149
+ scheduler_conf: {}
150
+ token_list:
151
+ - <blank>
152
+ - <unk>
153
+ - SP
154
+ - i
155
+ - AP
156
+ - e
157
+ - y
158
+ - d
159
+ - w
160
+ - sh
161
+ - ai
162
+ - n
163
+ - x
164
+ - j
165
+ - ian
166
+ - u
167
+ - l
168
+ - h
169
+ - b
170
+ - o
171
+ - zh
172
+ - an
173
+ - ou
174
+ - m
175
+ - q
176
+ - z
177
+ - en
178
+ - g
179
+ - ing
180
+ - ei
181
+ - ao
182
+ - ang
183
+ - uo
184
+ - eng
185
+ - t
186
+ - a
187
+ - ong
188
+ - ui
189
+ - k
190
+ - f
191
+ - r
192
+ - iang
193
+ - ch
194
+ - v
195
+ - in
196
+ - iao
197
+ - ie
198
+ - iu
199
+ - c
200
+ - s
201
+ - van
202
+ - p
203
+ - ve
204
+ - uan
205
+ - uang
206
+ - ia
207
+ - ua
208
+ - uai
209
+ - un
210
+ - er
211
+ - vn
212
+ - iong
213
+ - <sos/eos>
214
+ odim: null
215
+ model_conf: {}
216
+ use_preprocessor: true
217
+ token_type: phn
218
+ bpemodel: null
219
+ non_linguistic_symbols: null
220
+ cleaner: null
221
+ g2p: null
222
+ fs: 24000
223
+ score_feats_extract: syllable_score_feats
224
+ score_feats_extract_conf:
225
+ fs: 24000
226
+ n_fft: 2048
227
+ win_length: 1200
228
+ hop_length: 300
229
+ feats_extract: fbank
230
+ feats_extract_conf:
231
+ n_fft: 2048
232
+ hop_length: 300
233
+ win_length: 1200
234
+ fs: 24000
235
+ fmin: 80
236
+ fmax: 7600
237
+ n_mels: 80
238
+ normalize: global_mvn
239
+ normalize_conf:
240
+ stats_file: exp/svs_stats_raw_phn_None_zh/train/feats_stats.npz
241
+ svs: naive_rnn_dp
242
+ svs_conf:
243
+ midi_dim: 129
244
+ embed_dim: 512
245
+ duration_dim: 512
246
+ eprenet_conv_layers: 0
247
+ eprenet_conv_chans: 256
248
+ eprenet_conv_filts: 3
249
+ elayers: 3
250
+ eunits: 256
251
+ ebidirectional: true
252
+ midi_embed_integration_type: add
253
+ dlayers: 2
254
+ dunits: 256
255
+ dbidirectional: true
256
+ postnet_layers: 5
257
+ postnet_chans: 512
258
+ postnet_filts: 5
259
+ use_batch_norm: true
260
+ reduction_factor: 1
261
+ eprenet_dropout_rate: 0.2
262
+ edropout_rate: 0.1
263
+ ddropout_rate: 0.1
264
+ postnet_dropout_rate: 0.5
265
+ init_type: pytorch
266
+ use_masking: true
267
+ pitch_extract: dio
268
+ pitch_extract_conf:
269
+ use_token_averaged_f0: false
270
+ fs: 24000
271
+ n_fft: 2048
272
+ hop_length: 300
273
+ f0max: 800
274
+ f0min: 80
275
+ reduction_factor: 1
276
+ pitch_normalize: global_mvn
277
+ pitch_normalize_conf:
278
+ stats_file: exp/svs_stats_raw_phn_None_zh/train/pitch_stats.npz
279
+ ying_extract: null
280
+ ying_extract_conf: {}
281
+ energy_extract: null
282
+ energy_extract_conf: {}
283
+ energy_normalize: null
284
+ energy_normalize_conf: {}
285
+ required:
286
+ - output_dir
287
+ - token_list
288
+ version: '202310'
289
+ distributed: false
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/backward_time.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/clip.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/duration_loss.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/forward_time.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/gpu_max_cached_mem_GB.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/grad_norm.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/iter_time.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/l1_loss.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/loss.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/loss_scale.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/optim0_lr0.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/optim_step_time.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/images/train_time.png ADDED
exp/svs_train_naive_rnn_dp_raw_phn_None_zh/valid.loss.ave_2best.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:71cbf051e374b96baf66fb6c691a2768a07bdf2e83e271fce660cb0e7a287821
3
+ size 86947295
meta.yaml ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ espnet: '202310'
2
+ files:
3
+ model_file: exp/svs_train_naive_rnn_dp_raw_phn_None_zh/valid.loss.ave_2best.pth
4
+ python: "3.9.16 (main, Mar 8 2023, 14:00:05) \n[GCC 11.2.0]"
5
+ timestamp: 1702743492.700535
6
+ torch: 1.13.1+cu117
7
+ yaml_files:
8
+ train_config: exp/svs_train_naive_rnn_dp_raw_phn_None_zh/config.yaml