You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Card for Model ID

This model is a fine-tuned version of the Whisper v3 model, specifically trained for automatic speech recognition (ASR) in Cantonese (Yue). The model has been fine-tuned on data from the Common Voice 17 dataset for 10 epochs with a learning rate of 1e-7.

Model Details

  • Model Architecture: Whisper v3
  • Language: Cantonese (Yue)
  • Training Dataset: Common Voice 17
  • Training Duration: 10 epochs
  • Learning Rate: 1e-7
  • Frozen Layers: 12 layers in the decoder are frozen during training

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: khleeloo (Rita Frieske)
  • Language(s) (NLP): Cantonese
  • License: apache-2.0
  • Finetuned from model [optional]: openai/whisper-large-v3

Uses

This model is intended for researchers and developers interested in building applications that require speech recognition capabilities in Cantonese. It can be used in various applications, including:

  • Voice assistants
  • Transcription services
  • Accessibility features for Cantonese speakers

Bias, Risks, and Limitations

The model is specifically fine-tuned for Cantonese and may not perform well on other languages or dialects. Performance may vary based on the quality and accent of the audio input. The model's effectiveness is dependent on the diversity and richness of the training data.

How to Get Started with the Model

To use this model, you can load it using the Hugging Face Transformers library:

from transformers import WhisperProcessor, WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("your_username/whisper-cantonese")
processor = WhisperProcessor.from_pretrained("your_username/whisper-cantonese")

Training

Training Data

  • mozilla-foundation/common_voice_17_0

Evaluation

Testing Data, Factors & Metrics

Common Voice_17_0 yue test split Common Voice 15_0 yue test split and Common Voice 15_0 zh-HK test split (these test dataset were used to evaluate Whisper 3.0)

Metrics

Character Error Rate (CER) since Cantonese is character based language.

Results

CV15_0 zh-HK CV 15_0 yue CV 17_0 yue
Whisper large v3 10.8 16 -
Whisper cantonese (ours) 18.88 8.77 7.26

Explanation: our model was not trained on zh-HK data consisting of more written Cantonese but rather more vernacular Cantonese version (yue) since it is a speech recognition model. Hence the weaker performance on zh-HK splits of Common Voice dataset.

Summary

Citation [optional]

BibTeX:

@misc {rita_frieske_2025, author = { {Rita Frieske} }, title = { whisper-large-v3-cantonese }, year = 2025, url = { https://huggingface.co/khleeloo/whisper-large-v3-cantonese }, doi = { 10.57967/hf/4393 }, publisher = { Hugging Face } }

Model Card Authors [optional]

https://khleeloo.github.io/

Downloads last month
92
Safetensors
Model size
1.54B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for khleeloo/whisper-large-v3-cantonese

Finetuned
(425)
this model

Dataset used to train khleeloo/whisper-large-v3-cantonese