Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available:
5.15.0
Introduction to configurations
Dataset
dataset: # the dataset part is for training only
train:
wav_scp: './train/wav.scp'
mel_scp: './train/mel.scp'
dur_scp: './train/dur.scp'
emb_type1:
_name: 'pinyin'
scp: './train/py.scp'
vocab: 'py.vocab'
emb_type2:
_name: 'graphic'
scp: './train/gp.scp'
vocab: 'gp.vocab'
#emb_type3:
#_name: 'speaker'
# scp: './train/spk.scp'
# vocab: # dosn't need vocab
emb_type4:
_name: 'prosody'
scp: './train/psd.scp'
vocab:
Vocoder
vocoder:
type: VocGan # choose one of the following
MelGAN:
checkpoint: ~/checkpoints/melgan/melgan_ljspeech.pth
config: ~/checkpoints/melgan/default.yaml
device: cpu
VocGan:
checkpoint: ~/checkpoints/vctk_pretrained_model_3180.pt #~/checkpoints/ljspeech_29de09d_4000.pt
denoise: True
device: cpu
HiFiGAN:
checkpoint: ~/checkpoints/VCTK_V3/generator_v3 # you need to download checkpoint and set the params here
device: cpu
Waveglow:
checkpoint: ~/checkpoints/waveglow_256channels_universal_v5_state_dict.pt
sigma: 1.0
denoiser_strength: 0.0 # try 0.1
device: cpu #try cpu if out of memory
Make your own changes
Two config files are provided in the examples for illustration purpose. You can changed the config file if you know what you are doing. For example, you can remove speaker_emb from the following section, or add prosody embedding if you have prosody label (as in biaobei dataset).