Spaces:

vishred18
/

Comparative-Analysis-of-Speech-Synthesis-Models

Build error

Comparative-Analysis-of-Speech-Synthesis-Models

File size: 1,993 Bytes

d5ee97c

# Fast speech 2 multi-speaker english lang based

## Prepare
Everything is done from main repo folder so TensorflowTTS/

0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
- Dataset structure after finish this step:
    ```
    |- TensorFlowTTS/
    |   |- LibriTTS/
    |   |-  |- train-clean-100/
    |   |-  |- SPEAKERS.txt
    |   |-  |- ...
    |   |- libritts/
    |   |-  |- 200/
    |   |-  |-  |- 200_124139_000001_000000.txt
    |   |-  |-  |- 200_124139_000001_000000.wav
    |   |-  |-  |- ...
    |   |-  |- 250/
    |   |-  |- ...
    |   |- tensorflow_tts/
    |       |- models/
    |       |- ...
    ``` 
1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2) 
2. Optional* build docker 
- ```
  bash examples/fastspeech2_libritts/scripts/build.sh
  ```
3. Optional* run docker
- ```
  bash examples/fastspeech2_libritts/scripts/interactive.sh
  ```
4. Preprocessing:
- ```
  tensorflow-tts-preprocess --rootdir ./libritts \
    --outdir ./dump_libritts \
    --config preprocess/libritts_preprocess.yaml \
    --dataset libritts
  ```

5. Normalization:
- ```
  tensorflow-tts-normalize --rootdir ./dump_libritts \
    --outdir ./dump_libritts \
    --config preprocess/libritts_preprocess.yaml \
    --dataset libritts
  ```

6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
7. Change train_libri.sh to match your dataset and run:
- ```
  bash examples/fastspeech2_libritts/scripts/train_libri.sh
  ```
8. Optional* If u have problems with tensor sizes mismatch check step 5 in `examples/mfa_extraction` directory

## Comments

This version is using popular train.txt '|' split used in other repos. Training files should looks like this =>

Wav Path | Text | Speaker Name

Wav Path2 | Text | Speaker Name