projecte-aina
/

whisper-large-v3-ca-3catparla

+---
+language: ca
+datasets:
+  - projecte-aina/whisper-large-v3-ca-3catparla
+tags:
+  - audio
+  - automatic-speech-recognition
+  - catalan
+  - whisper-large-v3
+  - projecte-aina
+  - barcelona-supercomputing-center
+  - bsc
+license: apache-2.0
+model-index:
+  - name: whisper-large-v3-ca-3catparla
+    results:
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: 3CatParla (Test)
+          type: projecte-aina/3catparla_asr
+          split: test
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 0.96
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: 3CatParla (Dev)
+          type: projecte-aina/3catparla_asr
+          split: dev
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 0.92
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Mozilla Common Voice 17.0 (Test)
+          type: mozilla-foundation/common_voice_17_0
+          split: test
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 10.32
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Mozilla Common Voice 17.0 (Dev)
+          type: mozilla-foundation/common_voice_17_0
+          split: validation
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 9.26
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Balearic female
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 12.25
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Balearic male
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 12.18
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Central female
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 8.51
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Central male
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 8.73
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Northern female
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 8.09
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Northern male
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 8.28
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Northwestern female
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 7.88
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Northwestern male
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 8.44
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Valencian female
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 9.58
+      - task:
+          name: Automatic Speech Recognition
+          type: automatic-speech-recognition
+        dataset:
+          name: Common Voice Benchmark Catalan Accents
+          type: projecte-aina/commonvoice_benchmark_catalan_accents
+          split: Valencian male
+          args:
+            language: ca
+        metrics:
+          - name: WER
+            type: wer
+            value: 9.10
+---
+# whisper-large-v3-ca-3catparla
+**Paper:** [3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition](https://iberspeech.tech/)
+The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of fine-tuning the model "openai/whisper-large-v3" with 710 hours of Catalan data released by the Projecte AINA (https://projecteaina.cat/) from Barcelona, Spain.
+The specific dataset used to create the model is called ["3Catparla"](projecte-aina/whisper-large-v3-ca-3catparla).
+The fine-tuning process was perform during July (2024) in the servers of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).
+# Evaluation
+```python
+import torch
+from transformers import WhisperForConditionalGeneration, WhisperProcessor
+#Load the processor and model.
+MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
+processor = WhisperProcessor.from_pretrained(MODEL_NAME)
+model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")
+#Load the dataset
+from datasets import load_dataset, load_metric, Audio
+ds=load_dataset("projecte-aina/whisper-large-v3-ca-3catparla",split='test')
+#Downsample to 16kHz
+ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
+#Process the dataset
+def map_to_pred(batch):
+	audio = batch["audio"]
+	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
+	batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])
+	with torch.no_grad():
+		predicted_ids = model.generate(input_features.to("cuda"))[0]
+	transcription = processor.decode(predicted_ids)
+	batch["prediction"] = processor.tokenizer._normalize(transcription)
+	return batch
+#Do the evaluation
+result = ds.map(map_to_pred)
+#Compute the overall WER now.
+from evaluate import load
+wer = load("wer")
+WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
+print(WER)
+```
+**Test Result**: 0.96
+# BibTeX entry and citation info
+* When publishing results based on these models please refer to:
+```bibtex
+@misc{mena2024whisperlarge3catparla,
+      title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.},
+      author={Hernandez Mena, Carlos Daniel},
+      organization={Barcelona Supercomputing Center},
+      url={https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla},
+      year={2024}
+}
+```
+# Acknowledgements
+This model has been promoted and financed by the Government of Catalonia through the Aina project.