metadata

library_name: transformers
license: cc-by-sa-4.0
base_model: airesearch/wav2vec2-large-xlsr-53-th
tags:
  - generated_from_trainer
metrics:
  - accuracy
model-index:
  - name: wav2vec2-large-xlsr-53-th-speech-emotion-recognition-4c
    results: []

wav2vec2-large-xlsr-53-th-speech-emotion-recognition-4c

This model is a fine-tuned version of airesearch/wav2vec2-large-xlsr-53-th on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.4840
Accuracy: 0.8270

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.3113	0.9963	67	1.3085	0.3879
0.9422	1.9926	134	0.9515	0.5786
0.8097	2.9888	201	0.7753	0.6958
0.7	4.0	269	0.6606	0.7591
0.6038	4.9963	336	0.5957	0.7833
0.5796	5.9926	403	0.6206	0.7805
0.5413	6.9888	470	0.5471	0.7991
0.4974	8.0	538	0.5784	0.8009
0.4623	8.9963	605	0.5212	0.8130
0.4503	9.9926	672	0.5237	0.8242
0.428	10.9888	739	0.4823	0.8233
0.3958	12.0	807	0.5192	0.8270
0.3953	12.9963	874	0.4854	0.8270
0.3696	13.9926	941	0.4877	0.8251
0.3715	14.9888	1008	0.4845	0.8279
0.386	16.0	1076	0.4829	0.8233
0.3505	16.9963	1143	0.4850	0.8214
0.3166	17.9926	1210	0.4973	0.8270
0.366	18.9888	1277	0.4829	0.8270
0.3386	19.9257	1340	0.4840	0.8270

Framework versions

Transformers 4.44.2
Pytorch 2.4.1+cu121
Datasets 3.0.1
Tokenizers 0.19.1