carlosdanielhernandezmena commited on
Commit
a12ff21
1 Parent(s): e61a725

Adding info to the README file.

Browse files
Files changed (1) hide show
  1. README.md +119 -0
README.md CHANGED
@@ -1,3 +1,122 @@
1
  ---
 
 
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language: mt
3
+ datasets:
4
+ - common_voice
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - maltese
9
+ - whisper-large-v2
10
+ - masri-project
11
+ - malta
12
+ - university-of-malta
13
  license: cc-by-4.0
14
+ widget:
15
+ model-index:
16
+ - name: whisper-largev2-maltese-8k-steps-64h
17
+ results:
18
+ - task:
19
+ name: Automatic Speech Recognition
20
+ type: automatic-speech-recognition
21
+ dataset:
22
+ name: MASRI-TEST Corpus
23
+ type: MLRS/masri_test
24
+ split: test
25
+ args:
26
+ language: mt
27
+ metrics:
28
+ - name: WER
29
+ type: wer
30
+ value: 19.830
31
+ - task:
32
+ name: Automatic Speech Recognition
33
+ type: automatic-speech-recognition
34
+ dataset:
35
+ name: MASRI-DEV Corpus
36
+ type: MLRS/masri_dev
37
+ split: validation
38
+ args:
39
+ language: mt
40
+ metrics:
41
+ - name: WER
42
+ type: wer
43
+ value: 19.734
44
  ---
45
+
46
+ # whisper-largev2-maltese-8k-steps-64h
47
+
48
+ The "whisper-largev2-maltese-8k-steps-64h" is an acoustic model suitable for Automatic Speech Recognition in Maltese. It is the result of fine-tuning the model "openai/whisper-large-v2" with around 64 hours of Maltese data developed by the MASRI Project at the University of Malta between 2019 and 2021. Most of the data is available at the the MASRI Project homepage https://www.um.edu.mt/projects/masri/.
49
+
50
+ The specific list of corpora used to fine-tune the model is:
51
+
52
+ - MASRI-HEADSET v2 (6h39m)
53
+ - MASRI-Farfield (9h37m)
54
+ - MASRI-Booths (2h27m)
55
+ - MASRI-MEP (1h17m)
56
+ - MASRI-COMVO (7h29m)
57
+ - MASRI-TUBE (13h17m)
58
+ - MASRI-MERLIN (25h18m) *Not available at the MASRI Project homepage
59
+
60
+ The fine-tuning process was perform during March (2023) in the servers of the Language and Voice Lab (https://lvl.ru.is/) at Reykjavík University (Iceland) by Carlos Daniel Hernández Mena.
61
+
62
+ # Evaluation
63
+ ```python
64
+ import torch
65
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
66
+
67
+ #Load the processor and model.
68
+ MODEL_NAME="carlosdanielhernandezmena/whisper-largev2-maltese-8k-steps-64h"
69
+ processor = WhisperProcessor.from_pretrained(MODEL_NAME)
70
+ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
71
+
72
+ #Load the dataset
73
+ from datasets import load_dataset, load_metric, Audio
74
+ ds=load_dataset("MLRS/masri_test",split='test')
75
+
76
+ #Downsample to 16kHz
77
+ ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
78
+
79
+ #Process the dataset
80
+ def map_to_pred(batch):
81
+ audio = batch["audio"]
82
+ input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
83
+ batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
84
+
85
+ with torch.no_grad():
86
+ predicted_ids = model.generate(input_features.to("cuda"))[0]
87
+
88
+ transcription = processor.decode(predicted_ids)
89
+ batch["prediction"] = processor.tokenizer._normalize(transcription)
90
+
91
+ return batch
92
+
93
+ #Do the evaluation
94
+ result = ds.map(map_to_pred)
95
+
96
+ #Compute the overall WER now.
97
+ from evaluate import load
98
+
99
+ wer = load("wer")
100
+ WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
101
+ print(WER)
102
+ ```
103
+ **Test Result**: 19.830687830687832
104
+
105
+ # BibTeX entry and citation info
106
+ *When publishing results based on these models please refer to:*
107
+ ```bibtex
108
+ @misc{mena2023whisperlargev2maltese,
109
+ title={Acoustic Model in Maltese: whisper-largev2-maltese-8k-steps-64h.},
110
+ author={Hernandez Mena, Carlos Daniel},
111
+ year={2023},
112
+ url={https://huggingface.co/carlosdanielhernandezmena/whisper-largev2-maltese-8k-steps-64h},
113
+ }
114
+ ```
115
+
116
+ # Acknowledgements
117
+
118
+ The MASRI Project is funded by the University of Malta Research Fund Awards. We want to thank to Merlin Publishers (Malta) for provinding the audiobooks used to create the MASRI-MERLIN Corpus.
119
+
120
+ Thanks to Jón Guðnason, head of the Language and Voice Lab for providing computational power to make this model possible. We also want to thank to the "Language Technology Programme for Icelandic 2019-2023" which is managed and coordinated by Almannarómur, and it is funded by the Icelandic Ministry of Education, Science and Culture.
121
+
122
+ Special thanks to Björn Ingi Stefánsson for setting up the configuration of the server where this model was trained.