---
library_name: transformers
license: apache-2.0
base_model: openai/whisper-tiny
tags:
- generated_from_trainer
model-index:
- name: test-whisper-tiny-th
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# test-whisper-tiny-th

This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.8875
- Cer: 34.9798

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5.0

### Training results

| Training Loss | Epoch | Step | Validation Loss | Cer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| No log        | 1.0   | 7    | 0.9713          | 37.2984 |
| 1.1414        | 2.0   | 14   | 0.9285          | 34.4758 |
| 0.8953        | 3.0   | 21   | 0.9022          | 35.2823 |
| 0.8953        | 4.0   | 28   | 0.8911          | 52.9234 |
| 0.8159        | 5.0   | 35   | 0.8875          | 34.9798 |

| Model                                   |       WER (CV18)       |        WER (Gowejee)      |     WER (LOTUS-TRD)    |      WER (Thai Dialect)    |        WER (Elderly)       |     WER (Gigaspeech2)      |       WER (Fleurs)         |     WER (Distant Meeting)  |       WER (Podcast)        |
|:----------------------------------------|:----------------------:|:-------------------------:|:----------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|:--------------------------:|
| whisper-large-v3                        |         18.75          |          46.59            |         48.14          |           57.82            |           12.27            |           33.26            |           24.08            |           72.57            |           41.24            |
| airesearch-wav2vec2-large-xlsr-53-th    |         8.49           |          17.28            |         63.01          |           48.53            |           11.29            |           52.72            |           37.32            |           85.11            |           65.12            |
| thonburian-whisper-th-large-v3-combined |         7.62           |          22.06            |         41.95          |           26.53            |           1.63             |           25.22            |           13.90            |           64.68            |           32.42            |
| monsoon-whisper-medium-gigaspeech2      |         11.66          |          20.50            |         41.04          |           42.06            |           7.57             |           21.40            |           21.54            |           51.65            |           38.89            |
| pathumma-whisper-th-large-v3            |         8.68           |           9.84            |         15.47          |           19.85            |           1.53             |           21.66            |           15.65            |           51.56            |           36.47            |


| Model                        |  ASR-th CV18 th (WER↓)   | ASR-en CV18 En (WER↓)    |   ASR-en Librispeech En (WER↓) | ThaiSER Emotion (Acc↑, F1↑)|  ThaiSER Gender (Acc↑, F1↑)  |
|:----------------------------:|:------------------------:|:------------------------:|:------------------------------:|:------------------:|:--------------------:|
| Typhoon-Audio-Preview        | 13.26                    | 13.34 (partial result)   | 5.07 (partial result)          |    41.50, 33.48    |       96.20, 96.69   |
| DIVA                         | 69.15 (partial result)   | 37.40                    | 49.06                          |    18.64, 8.16     |       47.50, 35.90   |
| Gemini-1.5-Pro               | 16.49                    | 12.94                    | 25.83                          |    26.00, 18.26    |       79.66, 77.32   |
| Pathumma-llm-audio-1.0.0     | 12.03                    | 12.20                    | 11.36                          |    42.30, 36.88    |       90.30, 92.07   |


| Training Loss | Epoch | Step | Validation Loss | Cer     |
|:-------------:|:-----:|:----:|:---------------:|:-------:|
| No log        | 1.0   | 7    | 0.9713          | 37.2984 |
| 1.1414        | 2.0   | 14   | 0.9285          | 34.4758 |
| 0.8953        | 3.0   | 21   | 0.9022          | 35.2823 |
| 0.8953        | 4.0   | 28   | 0.8911          | 52.9234 |
| 0.8159        | 5.0   | 35   | 0.8875          | 34.9798 |

### Framework versions

- Transformers 4.44.2
- Pytorch 2.4.0+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1

## Citation

```
@misc{tipkasorn2024pathumma,
    title        = { {Pathumma-Audio} },
    author       = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-llm-audio-1.0.0 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}
```
## Citation

```
@misc{tipkasorn2024PatWhisper,
    title        = { {Pathumma Whisper Large V3 (TH)} },
    author       = { Pattara Tipkasorn and Wayupuk Sommuang and Oatsada Chatthong and Kwanchiva Thangthai },
    url          = { https://huggingface.co/nectec/Pathumma-whisper-th-large-v3 },
    publisher    = { Hugging Face },
    year         = { 2024 },
}
```