metadata

library_name: transformers
license: apache-2.0
base_model: openai/whisper-tiny
tags:
  - generated_from_trainer
model-index:
  - name: test-whisper-tiny-th
    results: []

test-whisper-tiny-th

This model is a fine-tuned version of openai/whisper-tiny on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.8875
Cer: 34.9798

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss	Cer
No log	1.0	7	0.9713	37.2984
1.1414	2.0	14	0.9285	34.4758
0.8953	3.0	21	0.9022	35.2823
0.8953	4.0	28	0.8911	52.9234
0.8159	5.0	35	0.8875	34.9798

Model	WER (CV18)	WER (Gowejee)	WER (LOTUS-TRD)	WER (Thai Dialect)	WER (Elderly)	WER (Gigaspeech2)	WER (Fleurs)	WER (Distant Meeting)	WER (Podcast)
whisper-large-v3	18.75	46.59	48.14	57.82	12.27	33.26	24.08	72.57	41.24
airesearch-wav2vec2-large-xlsr-53-th	8.49	17.28	63.01	48.53	11.29	52.72	37.32	85.11	65.12
thonburian-whisper-th-large-v3-combined	7.62	22.06	41.95	26.53	1.63	25.22	13.90	64.68	32.42
monsoon-whisper-medium-gigaspeech2	11.66	20.50	41.04	42.06	7.57	21.40	21.54	51.65	38.89
pathumma-whisper-th-large-v3	8.68	9.84	15.47	19.85	1.53	21.66	15.65	51.56	36.47

Model	ASR-th CV18 th (WER↓)	ASR-en CV18 En (WER↓)	ASR-en Librispeech En (WER↓)	ThaiSER Emotion (Acc↑, F1↑)	ThaiSER Gender (Acc↑, F1↑)
Typhoon-Audio-Preview	13.26	13.34 (partial result)	5.07 (partial result)	41.50, 33.48	96.20, 96.69
DIVA	69.15 (partial result)	37.40	49.06	18.64, 8.16	47.50, 35.90
Gemini-1.5-Pro	16.49	12.94	25.83	26.00, 18.26	79.66, 77.32
Pathumma-llm-audio-1.0.0	12.03	12.20	11.36	42.30, 36.88	90.30, 92.07

Training Loss	Epoch	Step	Validation Loss	Cer
No log	1.0	7	0.9713	37.2984
1.1414	2.0	14	0.9285	34.4758
0.8953	3.0	21	0.9022	35.2823
0.8953	4.0	28	0.8911	52.9234
0.8159	5.0	35	0.8875	34.9798

Framework versions

Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

Citation

@misc{tipkasorn2024pathumma,
    title        = { {Pathumma-Audio} },
    author       = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-llm-audio-1.0.0 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}

Citation

@misc{tipkasorn2024PatWhisper,
    title        = { {Pathumma Whisper Large V3 (TH)} },
    author       = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-whisper-th-large-v3 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}