File size: 1,993 Bytes
d5ee97c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Fast speech 2 multi-speaker english lang based

## Prepare
Everything is done from main repo folder so TensorflowTTS/

0. Optional* [Download](http://www.openslr.org/60/) and prepare libritts (helper to prepare libri in examples/fastspeech2_libritts/libri_experiment/prepare_libri.ipynb)
- Dataset structure after finish this step:
    ```
    |- TensorFlowTTS/
    |   |- LibriTTS/
    |   |-  |- train-clean-100/
    |   |-  |- SPEAKERS.txt
    |   |-  |- ...
    |   |- libritts/
    |   |-  |- 200/
    |   |-  |-  |- 200_124139_000001_000000.txt
    |   |-  |-  |- 200_124139_000001_000000.wav
    |   |-  |-  |- ...
    |   |-  |- 250/
    |   |-  |- ...
    |   |- tensorflow_tts/
    |       |- models/
    |       |- ...
    ``` 
1. Extract Duration (use examples/mfa_extraction or pretrained tacotron2) 
2. Optional* build docker 
- ```
  bash examples/fastspeech2_libritts/scripts/build.sh
  ```
3. Optional* run docker
- ```
  bash examples/fastspeech2_libritts/scripts/interactive.sh
  ```
4. Preprocessing:
- ```
  tensorflow-tts-preprocess --rootdir ./libritts \
    --outdir ./dump_libritts \
    --config preprocess/libritts_preprocess.yaml \
    --dataset libritts
  ```

5. Normalization:
- ```
  tensorflow-tts-normalize --rootdir ./dump_libritts \
    --outdir ./dump_libritts \
    --config preprocess/libritts_preprocess.yaml \
    --dataset libritts
  ```

6. Change CharactorDurationF0EnergyMelDataset speaker mapper in fastspeech2_dataset to match your dataset (if you use libri with mfa_extraction you didnt need to change anything)
7. Change train_libri.sh to match your dataset and run:
- ```
  bash examples/fastspeech2_libritts/scripts/train_libri.sh
  ```
8. Optional* If u have problems with tensor sizes mismatch check step 5 in `examples/mfa_extraction` directory

## Comments

This version is using popular train.txt '|' split used in other repos. Training files should looks like this =>

Wav Path | Text | Speaker Name

Wav Path2 | Text | Speaker Name