cdminix commited on
Commit
3f277ef
·
verified ·
1 Parent(s): 38da468

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -7,4 +7,42 @@ sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # TTSDS Benchmark
11
+
12
+ As many recent Text-to-Speech (TTS) models have shown, synthetic audio can be close to real human speech.
13
+ However, traditional evaluation methods for TTS systems need an update to keep pace with these new developments.
14
+ Our TTSDS benchmark assesses the quality of synthetic speech by considering factors like prosody, speaker identity, and intelligibility.
15
+ By comparing these factors with both real speech and noise datasets, we can better understand how synthetic speech stacks up.
16
+
17
+ ## More information
18
+ More details can be found in our paper [*TTSDS -- Text-to-Speech Distribution Score*](https://arxiv.org/abs/2407.12707).
19
+
20
+ ## Reproducibility
21
+ To reproduce our results, check out our repository [here](https://github.com/ttsds/ttsds).
22
+
23
+ ## Credits
24
+
25
+
26
+ This benchmark is inspired by [TTS Arena](https://huggingface.co/spaces/TTS-AGI/TTS-Arena) which instead focuses on the subjective evaluation of TTS models.
27
+ Our benchmark would not be possible without the many open-source TTS models on Hugging Face and GitHub.
28
+ Additionally, our benchmark uses the following datasets:
29
+ - [LJSpeech](https://keithito.com/LJ-Speech-Dataset/h)
30
+ - [LibriTTS](https://www.openslr.org/60/)
31
+ - [VCTK](https://datashare.ed.ac.uk/handle/10283/2950)
32
+ - [Common Voice](https://commonvoice.mozilla.org/)
33
+ - [ESC-50](https://github.com/karolpiczak/ESC-50)
34
+ And the following metrics/representations/tools:
35
+ - [Wav2Vec2](https://arxiv.org/abs/2006.11477)
36
+ - [Hubert](https://arxiv.org/abs/2006.11477)
37
+ - [WavLM](https://arxiv.org/abs/2110.13900)
38
+ - [PESQ](https://en.wikipedia.org/wiki/Perceptual_Evaluation_of_Speech_Quality)
39
+ - [VoiceFixer](https://arxiv.org/abs/2204.05841)
40
+ - [WADA SNR](https://www.cs.cmu.edu/~robust/Papers/KimSternIS08.pdf)
41
+ - [Whisper](https://arxiv.org/abs/2212.04356)
42
+ - [Masked Prosody Model](https://huggingface.co/cdminix/masked_prosody_model)
43
+ - [PyWorld](https://github.com/JeremyCCHsu/Python-Wrapper-for-World-Vocoder)
44
+ - [WeSpeaker](https://arxiv.org/abs/2210.17016)
45
+ - [D-Vector](https://github.com/yistLin/dvector)
46
+
47
+ Authors: Christoph Minixhofer, Ondřej Klejch, and Peter Bell
48
+ of the University of Edinburgh.