|
--- |
|
title: README |
|
emoji: 🦀 |
|
colorFrom: pink |
|
colorTo: gray |
|
sdk: static |
|
pinned: false |
|
--- |
|
|
|
# [TTSDS Benchmark](https://ttsdsbenchmark.com) |
|
|
|
As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech. |
|
However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments. |
|
Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility. |
|
By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up. |
|
|
|
## More information |
|
More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707). |
|
|
|
## Reproducibility |
|
To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds). |
|
|
|
## Citation |
|
|
|
``` |
|
@misc{minixhofer2024ttsds, |
|
title={TTSDS -- Text-to-Speech Distribution Score}, |
|
author={Christoph Minixhofer and Ondřej Klejch and Peter Bell}, |
|
year={2024}, |
|
eprint={2407.12707}, |
|
archivePrefix={arXiv}, |
|
primaryClass={eess.AS}, |
|
url={https://arxiv.org/abs/2407.12707}, |
|
} |
|
``` |
|
|