xls-asr-vi-40h-1B
This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on 40 hours of FPT Open Speech Dataset (FOSD) and Common Voice 7.0.
Benchmark WER result:
VIVOS | COMMON VOICE 7.0 | COMMON VOICE 8.0 | |
---|---|---|---|
without LM | 25.93 | 34.21 | |
with 4-grams LM | 24.11 | 25.84 | 31.158 |
Benchmark CER result:
VIVOS | COMMON VOICE 7.0 | COMMON VOICE 8.0 | |
---|---|---|---|
without LM | 9.24 | 19.94 | |
with 4-grams LM | 10.37 | 12.96 | 16.179 |
Evaluation
Please use the eval.py file to run the evaluation
python eval.py --model_id geninhu/xls-asr-vi-40h-1B --dataset mozilla-foundation/common_voice_7_0 --config vi --split test --log_outputs
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1500
- num_epochs: 10.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer |
---|---|---|---|---|
4.6222 | 1.85 | 1500 | 5.9479 | 0.5474 |
1.1362 | 3.7 | 3000 | 7.9799 | 0.5094 |
0.7814 | 5.56 | 4500 | 5.0330 | 0.4724 |
0.6281 | 7.41 | 6000 | 2.3484 | 0.5020 |
0.5472 | 9.26 | 7500 | 2.2495 | 0.4793 |
0.4827 | 11.11 | 9000 | 1.1530 | 0.4768 |
0.4327 | 12.96 | 10500 | 1.6160 | 0.4646 |
0.3989 | 14.81 | 12000 | 3.2633 | 0.4703 |
0.3522 | 16.67 | 13500 | 2.2337 | 0.4708 |
0.3201 | 18.52 | 15000 | 3.6879 | 0.4565 |
0.2899 | 20.37 | 16500 | 5.4389 | 0.4599 |
0.2776 | 22.22 | 18000 | 3.5284 | 0.4537 |
0.2574 | 24.07 | 19500 | 2.1759 | 0.4649 |
0.2378 | 25.93 | 21000 | 3.3901 | 0.4448 |
0.217 | 27.78 | 22500 | 1.1632 | 0.4565 |
0.2115 | 29.63 | 24000 | 1.7441 | 0.4232 |
0.1959 | 31.48 | 25500 | 3.4992 | 0.4304 |
0.187 | 33.33 | 27000 | 3.6163 | 0.4369 |
0.1748 | 35.19 | 28500 | 3.6038 | 0.4467 |
0.17 | 37.04 | 30000 | 2.9708 | 0.4362 |
0.159 | 38.89 | 31500 | 3.2045 | 0.4279 |
0.153 | 40.74 | 33000 | 3.2427 | 0.4287 |
0.1463 | 42.59 | 34500 | 3.5439 | 0.4270 |
0.139 | 44.44 | 36000 | 3.9381 | 0.4150 |
0.1352 | 46.3 | 37500 | 4.1744 | 0.4092 |
0.1369 | 48.15 | 39000 | 4.2279 | 0.4154 |
0.1273 | 50.0 | 40500 | 4.1691 | 0.4133 |
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.17.1.dev0
- Tokenizers 0.11.0
- Downloads last month
- 20
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train geninhu/xls-asr-vi-40h-1B
Evaluation results
- Test WER (with LM) on Common Voice 7.0self-reported25.846
- Test CER (with LM) on Common Voice 7.0self-reported12.961
- Test WER (with LM) on Common Voice 8.0self-reported31.158
- Test CER (with LM) on Common Voice 8.0self-reported16.179