PyTorch
ssl-aasist
custom_code
ash56's picture
Add files using upload-large-folder tool
a1d9110 verified
|
raw
history blame
1.27 kB
Speech Synthesis (S^2)
===
[https://arxiv.org/abs/2109.06912](https://arxiv.org/abs/2109.06912)
Speech synthesis with fairseq.
## Features
- Autoregressive and non-autoregressive models
- Multi-speaker synthesis
- Audio preprocessing (denoising, VAD, etc.) for less curated data
- Automatic metrics for model development
- Similar data configuration as [S2T](../speech_to_text/README.md)
## Examples
- [Single-speaker synthesis on LJSpeech](docs/ljspeech_example.md)
- [Multi-speaker synthesis on VCTK](docs/vctk_example.md)
- [Multi-speaker synthesis on Common Voice](docs/common_voice_example.md)
## Citation
Please cite as:
```
@article{wang2021fairseqs2,
title={fairseq S\^{} 2: A Scalable and Integrable Speech Synthesis Toolkit},
author={Wang, Changhan and Hsu, Wei-Ning and Adi, Yossi and Polyak, Adam and Lee, Ann and Chen, Peng-Jen and Gu, Jiatao and Pino, Juan},
journal={arXiv preprint arXiv:2109.06912},
year={2021}
}
@inproceedings{ott2019fairseq,
title = {fairseq: A Fast, Extensible Toolkit for Sequence Modeling},
author = {Myle Ott and Sergey Edunov and Alexei Baevski and Angela Fan and Sam Gross and Nathan Ng and David Grangier and Michael Auli},
booktitle = {Proceedings of NAACL-HLT 2019: Demonstrations},
year = {2019},
}
```