Spaces:
Running
Running
### Speaker Encoder | |
This is an implementation of https://arxiv.org/abs/1710.10467. This model can be used for voice and speaker embedding. | |
With the code here you can generate d-vectors for both multi-speaker and single-speaker TTS datasets, then visualise and explore them along with the associated audio files in an interactive chart. | |
Below is an example showing embedding results of various speakers. You can generate the same plot with the provided notebook as demonstrated in [this video](https://youtu.be/KW3oO7JVa7Q). | |
![](umap.png) | |
Download a pretrained model from [Released Models](https://github.com/mozilla/TTS/wiki/Released-Models) page. | |
To run the code, you need to follow the same flow as in TTS. | |
- Define 'config.json' for your needs. Note that, audio parameters should match your TTS model. | |
- Example training call ```python speaker_encoder/train.py --config_path speaker_encoder/config.json --data_path ~/Data/Libri-TTS/train-clean-360``` | |
- Generate embedding vectors ```python speaker_encoder/compute_embeddings.py --use_cuda true /model/path/best_model.pth model/config/path/config.json dataset/path/ output_path``` . This code parses all .wav files at the given dataset path and generates the same folder structure under the output path with the generated embedding files. | |
- Watch training on Tensorboard as in TTS | |