|
Speech Synthesis (S^2) |
|
=== |
|
[https://arxiv.org/abs/2109.06912](https://arxiv.org/abs/2109.06912) |
|
|
|
Speech synthesis with fairseq. |
|
|
|
## Features |
|
|
|
- Autoregressive and non-autoregressive models |
|
- Multi-speaker synthesis |
|
- Audio preprocessing (denoising, VAD, etc.) for less curated data |
|
- Automatic metrics for model development |
|
- Similar data configuration as [S2T](../speech_to_text/README.md) |
|
|
|
|
|
## Examples |
|
- [Single-speaker synthesis on LJSpeech](docs/ljspeech_example.md) |
|
- [Multi-speaker synthesis on VCTK](docs/vctk_example.md) |
|
- [Multi-speaker synthesis on Common Voice](docs/common_voice_example.md) |
|
|
|
|
|
## Citation |
|
Please cite as: |
|
``` |
|
@article{wang2021fairseqs2, |
|
title={fairseq S\^{} 2: A Scalable and Integrable Speech Synthesis Toolkit}, |
|
author={Wang, Changhan and Hsu, Wei-Ning and Adi, Yossi and Polyak, Adam and Lee, Ann and Chen, Peng-Jen and Gu, Jiatao and Pino, Juan}, |
|
journal={arXiv preprint arXiv:2109.06912}, |
|
year={2021} |
|
} |
|
|
|
@inproceedings{ott2019fairseq, |
|
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling}, |
|
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli}, |
|
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations}, |
|
year = {2019}, |
|
} |
|
``` |
|
|