marma commited on
Commit
1e40ec9
·
1 Parent(s): 45cee1b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -5
README.md CHANGED
@@ -10,14 +10,27 @@ language:
10
  This is a [Whisper tiny](https://huggingface.co/openai/whisper-tiny) finetuned for Swedish using
11
  the [RixVox](https://huggingface.co/datasets/KBLab/rixvox) dataset.
12
 
 
 
 
 
 
 
 
 
13
  ## Evaluation
14
 
15
- ### [Common Voice 11](#):
16
- * WER: XYZ
17
- * WER (normalized): XYZ
 
 
 
18
 
19
- * WER: 51.67615433270082
20
- * WER (normalized): 48.08777429467085
 
 
21
 
22
  ## Training
23
 
 
10
  This is a [Whisper tiny](https://huggingface.co/openai/whisper-tiny) finetuned for Swedish using
11
  the [RixVox](https://huggingface.co/datasets/KBLab/rixvox) dataset.
12
 
13
+ Please note that this model, as every other encoder-decoder speech-to-text model, is prone to
14
+ hallucinating on unexpected inputs and treats the task as translation rather than transcription.
15
+ I.e your mileage may vary depending on filtering and type of data.
16
+
17
+ In this release the entire encoder was frozen. Subsequent releases will not do this **if** the
18
+ generalization to other types of data (i.e not parliamentary speeches) is kept when not freezing
19
+ the encoder.
20
+
21
  ## Evaluation
22
 
23
+ <! --* Common Voice 11 WER: 17.18
24
+ * Common Voice 11 WER (normalized*): 12.24 -->
25
+ * Fleurs WER: 51.68
26
+ * Fleurs WER (normalized*): 48.09
27
+
28
+ *) Normalization is done by applying the following to source and generated texts:
29
 
30
+ ```
31
+ def normalize(s):
32
+ return ' '.join([ x for x in sub('[^0-9a-zåäöA-ZÅÄÖ ]', ' ', s.lower()).split() ])
33
+ ```
34
 
35
  ## Training
36