Model description
This model is a fine-tuned version of facebook/wav2vec2-xls-r-1b on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - JA
Benchmark WER result:
COMMON VOICE 7.0 | COMMON VOICE 8.0 | |
---|---|---|
without LM | 16.97 | 17.95 |
with 4-grams LM | 11.77 | 12.23 |
Benchmark CER result:
COMMON VOICE 7.0 | COMMON VOICE 8.0 | |
---|---|---|
without LM | 6.82 | 7.05 |
with 4-grams LM | 5.22 | 5.33 |
Evaluation
Please use the eval.py file to run the evaluation:
pip install mecab-python3 unidic-lite pykakasi
python eval.py --model_id vutankiet2901/wav2vec2-xls-r-1b-ja --dataset mozilla-foundation/common_voice_8_0 --config ja --split test --log_outputs
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2000
- num_epochs: 100.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss | Wer | Cer |
---|---|---|---|---|---|
3.484 | 9.49 | 1500 | 1.1849 | 0.7543 | 0.4099 |
1.3582 | 18.98 | 3000 | 0.4320 | 0.3489 | 0.1591 |
1.1716 | 28.48 | 4500 | 0.3835 | 0.3175 | 0.1454 |
1.0951 | 37.97 | 6000 | 0.3732 | 0.3033 | 0.1405 |
1.04 | 47.47 | 7500 | 0.3485 | 0.2898 | 0.1360 |
0.9768 | 56.96 | 9000 | 0.3386 | 0.2787 | 0.1309 |
0.9129 | 66.45 | 10500 | 0.3363 | 0.2711 | 0.1272 |
0.8614 | 75.94 | 12000 | 0.3386 | 0.2676 | 0.1260 |
0.8092 | 85.44 | 13500 | 0.3356 | 0.2610 | 0.1240 |
0.7658 | 94.93 | 15000 | 0.3316 | 0.2564 | 0.1218 |
Framework versions
- Transformers 4.16.0.dev0
- Pytorch 1.10.1+cu102
- Datasets 1.18.3
- Tokenizers 0.11.0
- Downloads last month
- 3
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.
Dataset used to train vutankiet2901/wav2vec2-xls-r-1b-ja
Evaluation results
- Test WER (with LM) on Common Voice 7.0self-reported11.770
- Test CER (with LM) on Common Voice 7.0self-reported5.220
- Test WER (with LM) on Common Voice 8.0self-reported12.230
- Test CER (with LM) on Common Voice 8.0self-reported5.330
- Test WER (with LM) on Robust Speech Event - Dev Dataself-reported29.350
- Test CER (with LM) on Robust Speech Event - Dev Dataself-reported16.430
- Test CER on Robust Speech Event - Test Dataself-reported19.480