nljubesi commited on
Commit
4dd05ff
1 Parent(s): 5c11cde

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -30,8 +30,8 @@ the audiobook of the translation of _Le Petit Prince_ into the Chakavian dialect
30
 
31
  ### Model Description
32
 
33
- The model was finetuned for 80 epochs with an effective batch size of 16. Performance was inspected every 4 epochs, and the latest checkpoint
34
- is uploaded here.
35
 
36
  - **Developed by:** Nikola Ljubešić, Peter Rupnik, Tea Perinčić
37
  - **Language(s) (NLP):** Croatian (hrv) - Chakavian dialect (ckm)
@@ -121,11 +121,13 @@ Only the `train` split was used in training.
121
  For evaluation, the `test` split of the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ) was used. The test split consists of two known speakers, Autor and Mići Princ, and two unknown speakers, Geograf and Dilavac. Important to note is that each speaker uses a different micro-dialect, so the test data is challenging on including two new micro-dialects.
122
 
123
  #### Metrics
124
-
125
-
126
- * WER: 0.039493
127
- * CER: 0.168341
128
-
 
 
129
 
130
  ## Citation
131
 
 
30
 
31
  ### Model Description
32
 
33
+ The model, already very potent in standard Croatian, was finetuned for 80 epochs with an effective batch size of 16. Performance was inspected every 4 epochs, and the latest checkpoint
34
+ is uploaded here. Character error rate has been brought down from 11.54% to 3.95%, while word error rate has been lowered from 35.43% to 16.83%.
35
 
36
  - **Developed by:** Nikola Ljubešić, Peter Rupnik, Tea Perinčić
37
  - **Language(s) (NLP):** Croatian (hrv) - Chakavian dialect (ckm)
 
121
  For evaluation, the `test` split of the [Mići Princ dataset](https://huggingface.co/datasets/classla/Mici_Princ) was used. The test split consists of two known speakers, Autor and Mići Princ, and two unknown speakers, Geograf and Dilavac. Important to note is that each speaker uses a different micro-dialect, so the test data is challenging on including two new micro-dialects.
122
 
123
  #### Metrics
124
+ | speaker | WER vanilla | WER fine-tuned | WER reduction | CER vanilla | CER fine-tuned| CER reduction |
125
+ |---|---|---|---|---|---|---|
126
+ | all | 35.43% | 16.83% | 52.50% | 11.54% | 3.95% | 65.77% |
127
+ | Autor | 38.96% | 14.29% | 63.32% | 10.24% | 2.93% | 71.39% |
128
+ | Geograf | 20.94% | 11.57% | 44.75% | 4.99% | 2.19% | 56.11% |
129
+ | Mići Princ | 45.31% | 16.62% | 63.32% | 12.21% | 5.09% | 58.31% |
130
+ | Dilavac | 39.60% | 23.70% | 40.15% | 18.55% | 5.27% | 71.59% |
131
 
132
  ## Citation
133