kwanchiva's picture
Update README.md
3808129 verified
|
raw
history blame
5.85 kB
metadata
library_name: transformers
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - generated_from_trainer
model-index:
  - name: test-whisper-tiny-th
    results: []

test-whisper-tiny-th

This model is a fine-tuned version of openai/whisper-tiny on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8875
  • Cer: 34.9798

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss Cer
No log 1.0 7 0.9713 37.2984
1.1414 2.0 14 0.9285 34.4758
0.8953 3.0 21 0.9022 35.2823
0.8953 4.0 28 0.8911 52.9234
0.8159 5.0 35 0.8875 34.9798
Model WER (CV18) WER (Gowejee) WER (LOTUS-TRD) WER (Thai Dialect) WER (Elderly) WER (Gigaspeech2) WER (Fleurs) WER (Distant Meeting) WER (Podcast)
whisper-large-v3 18.75 46.59 48.14 57.82 12.27 33.26 24.08 72.57 41.24
airesearch-wav2vec2-large-xlsr-53-th 8.49 17.28 63.01 48.53 11.29 52.72 37.32 85.11 65.12
thonburian-whisper-th-large-v3-combined 7.62 22.06 41.95 26.53 1.63 25.22 13.90 64.68 32.42
monsoon-whisper-medium-gigaspeech2 11.66 20.50 41.04 42.06 7.57 21.40 21.54 51.65 38.89
pathumma-whisper-th-large-v3 8.68 9.84 15.47 19.85 1.53 21.66 15.65 51.56 36.47
Model ASR-th CV18 th (WER↓) ASR-en CV18 En (WER↓) ASR-en Librispeech En (WER↓) ThaiSER Emotion (Acc↑, F1↑) ThaiSER Gender (Acc↑, F1↑)
Typhoon-Audio-Preview 13.26 13.34 (partial result) 5.07 (partial result) 41.50, 33.48 96.20, 96.69
DIVA 69.15 (partial result) 37.40 49.06 18.64, 8.16 47.50, 35.90
Gemini-1.5-Pro 16.49 12.94 25.83 26.00, 18.26 79.66, 77.32
Pathumma-llm-audio-1.0.0 12.03 12.20 11.36 42.30, 36.88 90.30, 92.07
Training Loss Epoch Step Validation Loss Cer
No log 1.0 7 0.9713 37.2984
1.1414 2.0 14 0.9285 34.4758
0.8953 3.0 21 0.9022 35.2823
0.8953 4.0 28 0.8911 52.9234
0.8159 5.0 35 0.8875 34.9798

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1

Citation

@misc{tipkasorn2024pathumma,
    title        = { {Pathumma-Audio} },
    author       = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-llm-audio-1.0.0 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}

Citation

@misc{tipkasorn2024PatWhisper,
    title        = { {Pathumma Whisper Large V3 (TH)} },
    author       = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-whisper-th-large-v3 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}