Updated Readme
Browse files
README.md
CHANGED
@@ -13,4 +13,48 @@ tags:
|
|
13 |
- International Phonetic Alphabet
|
14 |
- CTC
|
15 |
- multilingual
|
16 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
- International Phonetic Alphabet
|
14 |
- CTC
|
15 |
- multilingual
|
16 |
+
---
|
17 |
+
# Model Card for Wav2Vec2 Large with Common Phone
|
18 |
+
|
19 |
+
This is a multilingual phone recognition model optimized with the [Common Phone](https://zenodo.org/records/5846137) dataset.
|
20 |
+
It was created in the scope of the PhD thesis of [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ) to analyze pathological speech signals.
|
21 |
+
|
22 |
+
## Model Details
|
23 |
+
|
24 |
+
Wav2Vec2 model with linear projection to CTC blank token + 101 phone symbols from the International Phonetic Alphabet (IPA).
|
25 |
+
The model uses 16 kHz audio to predict the most probable sequence of uttered IPA phones.
|
26 |
+
|
27 |
+
### Model Description
|
28 |
+
|
29 |
+
This model was created to analyze pathological speech signals. It was optimized with Common Phone, a multilingual corpus for robust acoustic modelling. It comprises more than 11.000 speakers which were carefully selected from Mozilla's Common Voice dataset.
|
30 |
+
Results in terms of phone error rate (PER) in percent:
|
31 |
+
|
32 |
+
| Language | Test PER |
|
33 |
+
|:---:|:---:|
|
34 |
+
| English | 11.0 |
|
35 |
+
| French | 9.9 |
|
36 |
+
| German | 9.8 |
|
37 |
+
| Italian | 9.1 |
|
38 |
+
| Russian | 6.6 |
|
39 |
+
| Spanish | 8.8 |
|
40 |
+
| **Average** | **9.2** |
|
41 |
+
|
42 |
+
- **Developed by:** [Philipp Klumpp](https://scholar.google.com/citations?user=IWvgno4AAAAJ)
|
43 |
+
- **Model type:** [Wav2Vec2](https://huggingface.co/docs/transformers/model_doc/wav2vec2)
|
44 |
+
- **Languages:** Multilingual (English, French, German, Italian, Russian, Spanish)
|
45 |
+
- **License:** [Creative Commons Zero 1.0 (CC0)](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
|
46 |
+
- **Finetuned from model:** [Wav2Vec2 XLSR-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53)
|
47 |
+
- **Finetuning dataset:** [Common Phone](https://zenodo.org/records/5846137) as published in [**Common Phone: A Multilingual Dataset for Robust Acoustic Modelling**](http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.81.pdf)
|
48 |
+
|
49 |
+
### Model Sources [optional]
|
50 |
+
|
51 |
+
<!-- Provide the basic links for the model. -->
|
52 |
+
|
53 |
+
- **Repository:** [GitHub](https://github.com/PKlumpp/phd_model)
|
54 |
+
- **Paper:** The final print of the thesis will be linked here.
|
55 |
+
|
56 |
+
## Contact
|
57 |
+
|
58 |
+
[Philipp Klumpp](mailto:[email protected])
|
59 |
+
|
60 |
+
|