بسم اله الرحمن الرحیم - هست کلید در گنج حکیم

Still being completed, use with caution...

How to train hifigan

Hifigan, is a neural model converting melspectograms to voice REF. Hifigan, learns how to add phase information to melspectograms data.

These phase information are mostly not related to the spoken language. So also every one use the original models trained on multivoice english dataset VCTK.

So when do you need to train hifigan? When you need to use a frequency other than the original model using 22050Hz.

Attention: To train this model you need at least a 3090 grahic card operating for TWO WEEKS!!!

Install requiremets

git clone [email protected]:jik876/hifi-gan.git --depth 1

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.7
sudo apt install python3.7-venv

python3.7 -m venv hifi-gan-env
source hifi-gan-env/bin/activate

cd hifi-gan
pip install -r requirements.txt
pip install protobuf==3.20.3
pip install numba==0.52.0/
pip install librosa==0.8.1 numba==0.52.0
pip uninstall torch
pip install torch --extra-index-url https://download.pytorch.org/whl/cu122
pip install --upgrade numpy scipy librosa tensorboard soundfile matplotlib

Prepare dataset

Create dataset using matcha: source matcha-tts-env/bin/activate cd /home/oem/Basir/TTS/HiFi-GAN/MelDataset/mels/ remove first column from metadata.csv and save as metadata_raw.txt matcha-tts --file /home/oem/Basir/TTS/Datasets/Phone-Online/Female/metadata_raw.txt --checkpoint_path /home/oem/Basir/TTS/Matcha/Trained/inital_checkpoints/phone-24000-motahare.ckpt --vocoder hifigan_univ_v1 --denoiser_strength 0.000001 run rename_utterance_files_to_meta_data_1st_column_name.py rm -rf ./*.png

YOU MIGHT GET ERROR IN THE MIDDLE AND NEED TO DO PART OF IT AGAIN

Download universal _v1 from: https://drive.google.com/drive/folders/1-eEYTB5Av9jNql0WGBlRoi-WH2J7bp5Y?usp=sharing unzip it inside cp_hifigan folder to start from (Versions v2, v3 do not exist there)

In train.txt and val.txt, remove ".wav" from pathes

in config_v1.json set "batch_size": 8,

NOTE: initial model is got from /home/oem/Basir/TTS/HiFi-GAN/hifi-gan/cp_hifigan/ Remove the inside if no initial model is available

train: python3 train.py --config config_v1.json
--input_wavs_dir /home/oem/Basir/TTS/HiFi-GAN/MelDataset/wav
--input_training_file /home/oem/Basir/TTS/HiFi-GAN/MelDataset/train.txt
--input_validation_file /home/oem/Basir/TTS/HiFi-GAN/MelDataset/val.txt
--checkpoint_interval 10000

--input_mels_dir /home/oem/Basir/TTS/HiFi-GAN/MelDataset/mels #No need, syntetic mels gives bad results

--fine_tuning True \ #Just Works without it

--checkpoint_path cp_hifigan #Works without it

test: python inference_e2e.py --checkpoint_file /home/oem/Basir/TTS/HiFi-GAN/hifi-gan/cp_hifigan/g_02500060 --input_mels_dir /home/oem/Basir/TTS/HiFi-GAN/MelDataset/mels/ --output_dir /home/oem/Basir/TTS/HiFi-GAN/MelDataset/new_wavs/

WORKS JUST WITH V1

or replace the model with name hifigan_T2_v1: /home/oem/.local/share/matcha_tts/ and : matcha-tts --file /home/oem/Basir/TTS/HiFi-GAN/MelDataset/metadata_raw.txt --checkpoint_path /home/oem/Basir/TTS/Matcha/Trained/inital_checkpoints/phone-24000-motahare.ckpt --vocoder /home/oem/Basir/TTS/HiFi-GAN/Trained/g_00050000_24KHz_v1_phonedataset_motahare --denoiser_strength 0.00025000 --sample_rate 24000

Convert to onnx

pip install onnx onnxruntime NOT WORKS export PYTHONPATH=$PYTHONPATH:/home/oem/Basir/TTS/HiFi-GAN/hifi-gan