whisper-small-diarization-0.2

This model is a fine-tuned version of openai/whisper-small on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.4057
  • Speech Scored: 802.6678
  • Speech Miss: 209.2589
  • Speech Falarm: 9.0497
  • Speaker Miss: 412.5650
  • Speaker Falarm: 185.8256
  • Speaker Error: 153.1979
  • Speaker Correct: 1198.4045
  • Diarization Error: 751.5885
  • Frames: 1500.0
  • Speaker Wide Frames: 1564.7402
  • Speech Scored Ratio: 0.5351
  • Speech Miss Ratio: 0.1395
  • Speech Falarm Ratio: 0.0060
  • Speaker Correct Ratio: 0.7989
  • Speaker Miss Ratio: 0.2382
  • Speaker Falarm Ratio: 0.1203
  • Speaker Error Ratio: 0.0831
  • Diarization Error Ratio: 0.4416

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 16
  • eval_batch_size: 24
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 10
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Speech Scored Speech Miss Speech Falarm Speaker Miss Speaker Falarm Speaker Error Speaker Correct Diarization Error Frames Speaker Wide Frames Speech Scored Ratio Speech Miss Ratio Speech Falarm Ratio Speaker Correct Ratio Speaker Miss Ratio Speaker Falarm Ratio Speaker Error Ratio Diarization Error Ratio
0.4753 1.0 225 0.4855 731.3740 280.5527 69.0541 614.8317 199.7986 151.0314 1127.7690 965.6617 1500.0 1564.7402 0.4876 0.1870 0.0460 0.7518 0.3772 0.1827 0.0796 0.6395
0.4572 2.0 450 0.4857 642.0541 369.8727 29.7105 773.0375 89.8413 131.1212 1124.9596 994.0 1500.0 1564.7402 0.4280 0.2466 0.0198 0.7500 0.4566 0.0839 0.0709 0.6114
0.4638 3.0 675 0.4618 748.0445 263.8823 43.8745 466.9677 356.8117 116.3025 1147.8718 940.0820 1500.0 1564.7402 0.4987 0.1759 0.0292 0.7652 0.2922 0.2364 0.0631 0.5917
0.4423 4.0 900 0.4477 740.9529 270.9738 30.4473 509.6905 235.4464 132.9747 1162.9712 878.1116 1500.0 1564.7402 0.4940 0.1806 0.0203 0.7753 0.3072 0.1581 0.0712 0.5365
0.4164 5.0 1125 0.4309 737.6809 274.2459 12.4037 512.4150 173.0157 138.8300 1178.9698 824.2607 1500.0 1564.7402 0.4918 0.1828 0.0083 0.7860 0.3010 0.1130 0.0754 0.4893
0.3924 6.0 1350 0.4112 812.0453 199.8814 13.5414 382.1125 253.1543 140.2999 1194.7111 775.5667 1500.0 1564.7402 0.5414 0.1333 0.0090 0.7965 0.2235 0.1713 0.0755 0.4702
0.3765 7.0 1575 0.4085 806.7515 205.1752 12.1369 405.6992 202.0323 149.4699 1197.7762 757.2014 1500.0 1564.7402 0.5378 0.1368 0.0081 0.7985 0.2361 0.1250 0.0829 0.4439
0.3814 8.0 1800 0.4051 802.6016 209.3252 9.5911 398.2677 213.9582 144.1378 1199.8329 756.3636 1500.0 1564.7402 0.5351 0.1396 0.0064 0.7999 0.2367 0.1275 0.0794 0.4436
0.3965 9.0 2025 0.4111 768.8736 243.0532 6.9250 474.9355 148.3069 146.9695 1194.2729 770.2119 1500.0 1564.7402 0.5126 0.1620 0.0046 0.7962 0.2742 0.0932 0.0806 0.4480
0.4048 10.0 2250 0.4057 802.6678 209.2589 9.0497 412.5650 185.8256 153.1979 1198.4045 751.5885 1500.0 1564.7402 0.5351 0.1395 0.0060 0.7989 0.2382 0.1203 0.0831 0.4416

Framework versions

  • Transformers 4.36.2
  • Pytorch 2.0.0
  • Datasets 2.16.1
  • Tokenizers 0.15.0
Downloads last month
23
Safetensors
Model size
242M params
Tensor type
F32
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for anakib1/whisper-small-diarization-0.2

Finetuned
(2100)
this model