---
language: ca
datasets:
- projecte-aina/3catparla_asr
tags:
- audio
- automatic-speech-recognition
- catalan
- whisper-large-v3
- projecte-aina
- barcelona-supercomputing-center
- bsc
license: apache-2.0
model-index:
- name: whisper-large-v3-ca-3catparla
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: 3CatParla (Test)
      type: projecte-aina/3catparla_asr
      split: test
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 0.96
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: 3CatParla (Dev)
      type: projecte-aina/3catparla_asr
      split: dev
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 0.92
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Mozilla Common Voice 17.0 (Test)
      type: mozilla-foundation/common_voice_17_0
      split: test
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 10.32
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: Mozilla Common Voice 17.0 (Dev)
      type: mozilla-foundation/common_voice_17_0
      split: validation
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 9.26
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Balearic fem)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Balearic female
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 12.25
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Balearic male)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Balearic male
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 12.18
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Central fem)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Central female
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 8.51
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Central male)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Central male
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 8.73
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Northern fem)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Northern female
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 8.09
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Northern male)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Northern male
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 8.28
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Northwestern fem)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Northwestern female
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 7.88
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Northwestern male)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Northwestern male
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 8.44
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Valencian fem)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Valencian female
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 9.58
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: CV Benchmark Catalan Accents (Valencian male)
      type: projecte-aina/commonvoice_benchmark_catalan_accents
      split: Valencian male
      args:
        language: ca
    metrics:
    - name: WER
      type: wer
      value: 9.1
library_name: transformers
---
# whisper-large-v3-ca-3catparla
**Paper:** [3CatParla: A New Open-Source Corpus of Broadcast TV in Catalan for Automatic Speech Recognition](https://iberspeech.tech/)

The "whisper-large-v3-ca-3catparla" is an acoustic model suitable for Automatic Speech Recognition in Catalan. It is the result of finetuning the model ["openai/whisper-large-v3"](https://huggingface.co/openai/whisper-large-v3) with 710 hours of Catalan data released by the [Projecte AINA](https://projecteaina.cat/) from Barcelona, Spain.

The specific dataset used to create the model is called ["3CatParla"](https://huggingface.co/datasets/projecte-aina/3catparla_asr).

The fine-tuning process was perform during July (2024) in the servers of the [Barcelona Supercomputing Center](https://www.bsc.es/) by [Carlos Daniel Hernández Mena](https://huggingface.co/carlosdanielhernandezmena).

# Evaluation
```python
import torch
from transformers import WhisperForConditionalGeneration, WhisperProcessor

#Load the processor and model.
MODEL_NAME="projecte-aina/whisper-large-v3-ca-3catparla"
processor = WhisperProcessor.from_pretrained(MODEL_NAME)
model = WhisperForConditionalGeneration.from_pretrained(MODEL_NAME).to("cuda")

#Load the dataset
from datasets import load_dataset, load_metric, Audio
ds=load_dataset("projecte-aina/3catparla_asr",split='test')

#Downsample to 16kHz
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))

#Process the dataset
def map_to_pred(batch):
	audio = batch["audio"]
	input_features = processor(audio["array"], sampling_rate=audio["sampling_rate"], return_tensors="pt").input_features
	batch["reference"] = processor.tokenizer._normalize(batch['normalized_text'])

	with torch.no_grad():
		predicted_ids = model.generate(input_features.to("cuda"))[0]
	
	transcription = processor.decode(predicted_ids)
	batch["prediction"] = processor.tokenizer._normalize(transcription)
	
	return batch
	
#Do the evaluation
result = ds.map(map_to_pred)

#Compute the overall WER now.
from evaluate import load

wer = load("wer")
WER=100 * wer.compute(references=result["reference"], predictions=result["prediction"])
print(WER)
```
**Test Result**: 0.96

# BibTeX entry and citation info
* When publishing results based on these models please refer to:
```bibtex
@misc{mena2024whisperlarge3catparla,
      title={Acoustic Model in Catalan: whisper-large-v3-ca-3catparla.}, 
      author={Hernandez Mena, Carlos Daniel},
      organization={Barcelona Supercomputing Center},
      url={https://huggingface.co/projecte-aina/whisper-large-v3-ca-3catparla},
      year={2024}
}
```
# Acknowledgements

This model has been promoted and financed by the Government of Catalonia through the Aina project.