Update README.md
Browse files
README.md
CHANGED
@@ -30,8 +30,8 @@ the audiobook of the translation of _Le Petit Prince_ into the Chakavian dialect
|
|
30 |
|
31 |
### Model Description
|
32 |
|
33 |
-
The model was finetuned for 80 epochs with an effective batch size of 16. Performance was inspected every 4 epochs, and the latest checkpoint
|
34 |
-
is uploaded here.
|
35 |
|
36 |
- **Developed by:** Nikola Ljubešić, Peter Rupnik, Tea Perinčić
|
37 |
- **Language(s) (NLP):** Croatian (hrv) - Chakavian dialect (ckm)
|
@@ -121,11 +121,13 @@ Only the `train` split was used in training.
|
|
121 |
For evaluation, the `test` split of the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ) was used. The test split consists of two known speakers, Autor and Mići Princ, and two unknown speakers, Geograf and Dilavac. Important to note is that each speaker uses a different micro-dialect, so the test data is challenging on including two new micro-dialects.
|
122 |
|
123 |
#### Metrics
|
124 |
-
|
125 |
-
|
126 |
-
|
127 |
-
|
128 |
-
|
|
|
|
|
129 |
|
130 |
## Citation
|
131 |
|
|
|
30 |
|
31 |
### Model Description
|
32 |
|
33 |
+
The model, already very potent in standard Croatian, was finetuned for 80 epochs with an effective batch size of 16. Performance was inspected every 4 epochs, and the latest checkpoint
|
34 |
+
is uploaded here. Character error rate has been brought down from 11.54% to 3.95%, while word error rate has been lowered from 35.43% to 16.83%.
|
35 |
|
36 |
- **Developed by:** Nikola Ljubešić, Peter Rupnik, Tea Perinčić
|
37 |
- **Language(s) (NLP):** Croatian (hrv) - Chakavian dialect (ckm)
|
|
|
121 |
For evaluation, the `test` split of the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ) was used. The test split consists of two known speakers, Autor and Mići Princ, and two unknown speakers, Geograf and Dilavac. Important to note is that each speaker uses a different micro-dialect, so the test data is challenging on including two new micro-dialects.
|
122 |
|
123 |
#### Metrics
|
124 |
+
| speaker | WER vanilla | WER fine-tuned | WER reduction | CER vanilla | CER fine-tuned| CER reduction |
|
125 |
+
|---|---|---|---|---|---|---|
|
126 |
+
| all | 35.43% | 16.83% | 52.50% | 11.54% | 3.95% | 65.77% |
|
127 |
+
| Autor | 38.96% | 14.29% | 63.32% | 10.24% | 2.93% | 71.39% |
|
128 |
+
| Geograf | 20.94% | 11.57% | 44.75% | 4.99% | 2.19% | 56.11% |
|
129 |
+
| Mići Princ | 45.31% | 16.62% | 63.32% | 12.21% | 5.09% | 58.31% |
|
130 |
+
| Dilavac | 39.60% | 23.70% | 40.15% | 18.55% | 5.27% | 71.59% |
|
131 |
|
132 |
## Citation
|
133 |
|