|
--- |
|
license: apache-2.0 |
|
language: |
|
- ja |
|
tags: |
|
- automatic-speech-recognition |
|
- common-voice |
|
- hf-asr-leaderboard |
|
- ja |
|
- robust-speech-event |
|
datasets: |
|
- mozilla-foundation/common_voice_8_0 |
|
model-index: |
|
- name: wav2vec2-xls-r-1b |
|
results: |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 7.0 |
|
type: mozilla-foundation/common_voice_7_0 |
|
args: ja |
|
metrics: |
|
- name: Test WER (with LM) |
|
type: wer |
|
value: 11.77 |
|
- name: Test CER (with LM) |
|
type: cer |
|
value: 5.22 |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Common Voice 8.0 |
|
type: mozilla-foundation/common_voice_8_0 |
|
args: ja |
|
metrics: |
|
- name: Test WER (with LM) |
|
type: wer |
|
value: 12.23 |
|
- name: Test CER (with LM) |
|
type: cer |
|
value: 5.33 |
|
- task: |
|
name: Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Robust Speech Event - Dev Data |
|
type: speech-recognition-community-v2/dev_data |
|
args: ja |
|
metrics: |
|
- name: Test WER (with LM) |
|
type: wer |
|
value: 29.35 |
|
- name: Test CER (with LM) |
|
type: cer |
|
value: 16.43 |
|
- task: |
|
name: Automatic Speech Recognition |
|
type: automatic-speech-recognition |
|
dataset: |
|
name: Robust Speech Event - Test Data |
|
type: speech-recognition-community-v2/eval_data |
|
args: ja |
|
metrics: |
|
- name: Test CER |
|
type: cer |
|
value: 19.48 |
|
--- |
|
## Model description |
|
|
|
This model is a fine-tuned version of [facebook/wav2vec2-xls-r-1b](https://huggingface.co/facebook/wav2vec2-xls-r-1b) on the MOZILLA-FOUNDATION/COMMON_VOICE_8_0 - JA |
|
|
|
### Benchmark WER result: |
|
| | [COMMON VOICE 7.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) | [COMMON VOICE 8.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) |
|
|---|---|---| |
|
|without LM| 16.97 | 17.95 | |
|
|with 4-grams LM| 11.77 | 12.23| |
|
### Benchmark CER result: |
|
| | [COMMON VOICE 7.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_7_0) | [COMMON VOICE 8.0](https://huggingface.co/datasets/mozilla-foundation/common_voice_8_0) |
|
|---|---|---| |
|
|without LM| 6.82 | 7.05 | |
|
|with 4-grams LM| 5.22 | 5.33 | |
|
## Evaluation |
|
Please use the eval.py file to run the evaluation: |
|
```python |
|
pip install mecab-python3 unidic-lite pykakasi |
|
python eval.py --model_id vutankiet2901/wav2vec2-xls-r-1b-ja --dataset mozilla-foundation/common_voice_8_0 --config ja --split test --log_outputs |
|
``` |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size: 64 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 2000 |
|
- num_epochs: 100.0 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Wer | Cer | |
|
|:-------------:|:-----:|:-----:|:---------------:|:------:|:------:| |
|
| 3.484 | 9.49 | 1500 | 1.1849 | 0.7543 | 0.4099 | |
|
| 1.3582 | 18.98 | 3000 | 0.4320 | 0.3489 | 0.1591 | |
|
| 1.1716 | 28.48 | 4500 | 0.3835 | 0.3175 | 0.1454 | |
|
| 1.0951 | 37.97 | 6000 | 0.3732 | 0.3033 | 0.1405 | |
|
| 1.04 | 47.47 | 7500 | 0.3485 | 0.2898 | 0.1360 | |
|
| 0.9768 | 56.96 | 9000 | 0.3386 | 0.2787 | 0.1309 | |
|
| 0.9129 | 66.45 | 10500 | 0.3363 | 0.2711 | 0.1272 | |
|
| 0.8614 | 75.94 | 12000 | 0.3386 | 0.2676 | 0.1260 | |
|
| 0.8092 | 85.44 | 13500 | 0.3356 | 0.2610 | 0.1240 | |
|
| 0.7658 | 94.93 | 15000 | 0.3316 | 0.2564 | 0.1218 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.16.0.dev0 |
|
- Pytorch 1.10.1+cu102 |
|
- Datasets 1.18.3 |
|
- Tokenizers 0.11.0 |
|
|