File size: 2,651 Bytes
a6e8d57
 
90d66ba
a6e8d57
90d66ba
 
 
 
a5d9ae7
90d66ba
a6e8d57
 
 
492c67f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6e8d57
 
 
 
f405841
a6e8d57
b0b1764
 
 
a6e8d57
 
 
 
 
 
 
 
 
 
 
90d66ba
 
 
 
 
a6e8d57
 
 
 
 
 
f405841
 
a6e8d57
 
f405841
a6e8d57
 
 
 
 
 
 
 
 
 
f405841
 
 
 
 
 
 
 
 
 
 
 
 
 
a6e8d57
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
license: apache-2.0
language: as
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning
- as
- robust-speech-event
datasets:
- common_voice
model-index:
- name: XLS-R-300M - Assamese
  results:
  - task: 
      name: Automatic Speech Recognition 
      type: automatic-speech-recognition
    dataset:
      name: Common Voice 7
      type: mozilla-foundation/common_voice_7_0
      args: as
    metrics:
       - name: Test WER
         type: wer
         value: 72.64
       - name: Test CER
         type: cer
         value: 27.35
---

# wav2vec2-large-xls-r-300m-assamese

This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the common_voice_7_0 dataset.
It achieves the following results on the evaluation set:

- WER: 0.7954545454545454
- CER: 0.32341269841269843

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

To compute the evaluation parameters

```bash
cd wav2vec2-large-xls-r-300m-odia; python eval.py --model_id ./ --dataset mozilla-foundation/common_voice_7_0 --config or --split test --log_outputs
```

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:

- learning_rate: 3e-4
- train_batch_size: 16
- eval_batch_size: 8
- seed: not given
- gradient_accumulation_steps: 2
- total_train_batch_size: 32
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- num_epochs: 400
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch  | Step | Validation Loss | Wer      |
|:-------------:|:------:|:----:|:---------------:|:------:  |
| 1.584065      | NA     | 400  | 1.584065        | 0.915512 |
| 1.658865      | Na     | 800  | 1.658865        | 0.805096 |
| 1.882352      | NA     | 1200 | 1.882352        | 0.820742 |
| 1.881240      | NA     | 1600 | 1.881240        | 0.810907 |
| 2.159748      | NA     | 2000 | 2.159748        | 0.804202 |
| 1.992871      | NA     | 2400 | 1.992871        | 0.803308 |
| 2.201436      | NA     | 2800 | 2.201436        | 0.802861 |
| 2.165218      | NA     | 3200 | 2.165218        | 0.793920 |
| 2.253643      | NA     | 3600 | 2.253643        | 0.796603 |
| 2.265880      | NA     | 4000 | 2.265880        | 0.790344 |
| 2.293935      | NA     | 4400 | 2.293935        | 0.797050 |
| 2.288851      | NA     | 4800 | 2.288851        | 0.784086 |


### Framework versions

- Transformers 4.11.3
- Pytorch 1.10.0+cu113
- Datasets 1.13.3
- Tokenizers 0.10.3