carlosdanielhernandezmena commited on
Commit
9f80eba
·
verified ·
1 Parent(s): d663695

Adding info to the README file for the first time.

Browse files
Files changed (1) hide show
  1. README.md +266 -3
README.md CHANGED
@@ -1,3 +1,266 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ca
3
+ datasets:
4
+ - projecte-aina/whisper-large-v3-ca-3catparla
5
+ tags:
6
+ - audio
7
+ - automatic-speech-recognition
8
+ - catalan
9
+ - whisper-large-v3
10
+ - projecte-aina
11
+ - barcelona-supercomputing-center
12
+ - bsc
13
+ license: apache-2.0
14
+ model-index:
15
+ - name: whisper-large-v3-ca-3catparla
16
+ results:
17
+ - task:
18
+ name: Automatic Speech Recognition
19
+ type: automatic-speech-recognition
20
+ dataset:
21
+ name: 3CatParla (Test)
22
+ type: projecte-aina/3catparla_asr
23
+ split: test
24
+ args:
25
+ language: ca
26
+ metrics:
27
+ - name: WER
28
+ type: wer
29
+ value: 0.96
30
+ - task:
31
+ name: Automatic Speech Recognition
32
+ type: automatic-speech-recognition
33
+ dataset:
34
+ name: 3CatParla (Dev)
35
+ type: projecte-aina/3catparla_asr
36
+ split: dev
37
+ args:
38
+ language: ca
39
+ metrics:
40
+ - name: WER
41
+ type: wer
42
+ value: 0.92
43
+ - task:
44
+ name: Automatic Speech Recognition
45
+ type: automatic-speech-recognition
46
+ dataset:
47
+ name: Mozilla Common Voice 17.0 (Test)
48
+ type: mozilla-foundation/common_voice_17_0
49
+ split: test
50
+ args:
51
+ language: ca
52
+ metrics:
53
+ - name: WER
54
+ type: wer
55
+ value: 10.32
56
+ - task:
57
+ name: Automatic Speech Recognition
58
+ type: automatic-speech-recognition
59
+ dataset:
60
+ name: Mozilla Common Voice 17.0 (Dev)
61
+ type: mozilla-foundation/common_voice_17_0
62
+ split: validation
63
+ args:
64
+ language: ca
65
+ metrics:
66
+ - name: WER
67
+ type: wer
68
+ value: 9.26
69
+ - task:
70
+ name: Automatic Speech Recognition
71
+ type: automatic-speech-recognition
72
+ dataset:
73
+ name: Common Voice Benchmark Catalan Accents
74
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
75
+ split: Balearic female
76
+ args:
77
+ language: ca
78
+ metrics:
79
+ - name: WER
80
+ type: wer
81
+ value: 12.25
82
+ - task:
83
+ name: Automatic Speech Recognition
84
+ type: automatic-speech-recognition
85
+ dataset:
86
+ name: Common Voice Benchmark Catalan Accents
87
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
88
+ split: Balearic male
89
+ args:
90
+ language: ca
91
+ metrics:
92
+ - name: WER
93
+ type: wer
94
+ value: 12.18
95
+ - task:
96
+ name: Automatic Speech Recognition
97
+ type: automatic-speech-recognition
98
+ dataset:
99
+ name: Common Voice Benchmark Catalan Accents
100
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
101
+ split: Central female
102
+ args:
103
+ language: ca
104
+ metrics:
105
+ - name: WER
106
+ type: wer
107
+ value: 8.51
108
+ - task:
109
+ name: Automatic Speech Recognition
110
+ type: automatic-speech-recognition
111
+ dataset:
112
+ name: Common Voice Benchmark Catalan Accents
113
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
114
+ split: Central male
115
+ args:
116
+ language: ca
117
+ metrics:
118
+ - name: WER
119
+ type: wer
120
+ value: 8.73
121
+ - task:
122
+ name: Automatic Speech Recognition
123
+ type: automatic-speech-recognition
124
+ dataset:
125
+ name: Common Voice Benchmark Catalan Accents
126
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
127
+ split: Northern female
128
+ args:
129
+ language: ca
130
+ metrics:
131
+ - name: WER
132
+ type: wer
133
+ value: 8.09
134
+ - task:
135
+ name: Automatic Speech Recognition
136
+ type: automatic-speech-recognition
137
+ dataset:
138
+ name: Common Voice Benchmark Catalan Accents
139
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
140
+ split: Northern male
141
+ args:
142
+ language: ca
143
+ metrics:
144
+ - name: WER
145
+ type: wer
146
+ value: 8.28
147
+ - task:
148
+ name: Automatic Speech Recognition
149
+ type: automatic-speech-recognition
150
+ dataset:
151
+ name: Common Voice Benchmark Catalan Accents
152
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
153
+ split: Northwestern female
154
+ args:
155
+ language: ca
156
+ metrics:
157
+ - name: WER
158
+ type: wer
159
+ value: 7.88
160
+ - task:
161
+ name: Automatic Speech Recognition
162
+ type: automatic-speech-recognition
163
+ dataset:
164
+ name: Common Voice Benchmark Catalan Accents
165
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
166
+ split: Northwestern male
167
+ args:
168
+ language: ca
169
+ metrics:
170
+ - name: WER
171
+ type: wer
172
+ value: 8.44
173
+ - task:
174
+ name: Automatic Speech Recognition
175
+ type: automatic-speech-recognition
176
+ dataset:
177
+ name: Common Voice Benchmark Catalan Accents
178
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
179
+ split: Valencian female
180
+ args:
181
+ language: ca
182
+ metrics:
183
+ - name: WER
184
+ type: wer
185
+ value: 9.58
186
+ - task:
187
+ name: Automatic Speech Recognition
188
+ type: automatic-speech-recognition
189
+ dataset:
190
+ name: Common Voice Benchmark Catalan Accents
191
+ type: projecte-aina/commonvoice_benchmark_catalan_accents
192
+ split: Valencian male
193
+ args:
194
+ language: ca
195
+ metrics:
196
+ - name: WER
197
+ type: wer
198
+ value: 9.10
199
+ ---
200
+ # whisper-large-v3-ca-3catparla
201
+ **Paper:** [3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition](https://iberspeech.tech/)
202
+
203
+ The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of fine-tuning the model "openai/whisper-large-v3" with 710 hours of Catalan data released by the Projecte AINA (https://projecteaina.cat/) from Barcelona, Spain.
204
+
205
+ The specific dataset used to create the model is called ["3Catparla"](projecte-aina/whisper-large-v3-ca-3catparla).
206
+
207
+ The fine-tuning process was perform during July (2024) in the servers of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).
208
+
209
+ # Evaluation
210
+ ```python
211
+ import torch
212
+ from transformers import WhisperForConditionalGeneration, WhisperProcessor
213
+
214
+ #Load the processor and model.
215
+ MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
216
+ processor = WhisperProcessor.from_pretrained(MODEL_NAME)
217
+ model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
218
+
219
+ #Load the dataset
220
+ from datasets import load_dataset, load_metric, Audio
221
+ ds=load_dataset("projecte-aina/whisper-large-v3-ca-3catparla",split='test')
222
+
223
+ #Downsample to 16kHz
224
+ ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
225
+
226
+ #Process the dataset
227
+ def map_to_pred(batch):
228
+ audio = batch["audio"]
229
+ input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
230
+ batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
231
+
232
+ with torch.no_grad():
233
+ predicted_ids = model.generate(input_features.to("cuda"))[0]
234
+
235
+ transcription = processor.decode(predicted_ids)
236
+ batch["prediction"] = processor.tokenizer._normalize(transcription)
237
+
238
+ return batch
239
+
240
+ #Do the evaluation
241
+ result = ds.map(map_to_pred)
242
+
243
+ #Compute the overall WER now.
244
+ from evaluate import load
245
+
246
+ wer = load("wer")
247
+ WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
248
+ print(WER)
249
+ ```
250
+ **Test Result**: 0.96
251
+
252
+ # BibTeX entry and citation info
253
+ * When publishing results based on these models please refer to:
254
+ ```bibtex
255
+ @misc{mena2024whisperlarge3catparla,
256
+ title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
257
+ author={Hernandez Mena, Carlos Daniel},
258
+ organization={Barcelona Supercomputing Center},
259
+ url={https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla},
260
+ year={2024}
261
+ }
262
+ ```
263
+ # Acknowledgements
264
+
265
+ This model has been promoted and financed by the Government of Catalonia through the Aina project.
266
+