w11wo commited on
Commit
863a5e1
1 Parent(s): 970dace

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -15
README.md CHANGED
@@ -8,7 +8,7 @@ tags:
8
  datasets:
9
  - common_voice
10
  model-index:
11
- - name: Wav2Vec2 XLS-R 300M Cantonese (zh-HK)
12
  results:
13
  - task:
14
  name: Automatic Speech Recognition
@@ -20,7 +20,7 @@ model-index:
20
  metrics:
21
  - name: Test CER
22
  type: cer
23
- value: 31.73
24
  - task:
25
  name: Automatic Speech Recognition
26
  type: automatic-speech-recognition
@@ -31,34 +31,45 @@ model-index:
31
  metrics:
32
  - name: Test CER
33
  type: cer
34
- value: 56.60
35
  ---
36
 
37
- # Wav2Vec2 XLS-R 300M Cantonese (zh-HK)
38
 
39
- Wav2Vec2 XLS-R 300M Cantonese (zh-HK) is an automatic speech recognition model based on the [XLS-R](https://arxiv.org/abs/2111.09296) architecture. This model is a fine-tuned version of [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the `zh-HK` subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset.
40
 
41
  This model was trained using HuggingFace's PyTorch framework and is part of the [Robust Speech Challenge Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614) organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH.
42
 
43
- All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-v2/tree/main) tab, as well as the [Training metrics](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-v2/tensorboard) logged via Tensorboard.
 
 
44
 
45
  ## Model
46
 
47
- | Model | #params | Arch. | Training/Validation data (text) |
48
- | ------------------------------ | ------- | ----- | ------------------------------- |
49
- | `wav2vec2-xls-r-300m-zh-HK-v2` | 300M | XLS-R | `Common Voice zh-HK` Dataset |
50
 
51
  ## Evaluation Results
52
 
53
- The model achieves the following results on evaluation:
 
 
 
 
 
54
 
55
- | Dataset | Loss | CER |
56
- | -------------------------------- | ------ | ------ |
57
- | `Common Voice` | 0.8089 | 31.73% |
58
- | `Robust Speech Event - Dev Data` | N/A | 56.60% |
 
 
59
 
60
  ## Training procedure
61
 
 
 
62
  ### Training hyperparameters
63
 
64
  The following hyperparameters were used during training:
@@ -160,7 +171,7 @@ Do consider the biases which came from pre-training datasets that may be carried
160
 
161
  ## Authors
162
 
163
- Wav2Vec2 XLS-R 300M Cantonese (zh-HK) was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on OVH Cloud.
164
 
165
  ## Framework versions
166
 
 
8
  datasets:
9
  - common_voice
10
  model-index:
11
+ - name: Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM
12
  results:
13
  - task:
14
  name: Automatic Speech Recognition
 
20
  metrics:
21
  - name: Test CER
22
  type: cer
23
+ value: 12.14
24
  - task:
25
  name: Automatic Speech Recognition
26
  type: automatic-speech-recognition
 
31
  metrics:
32
  - name: Test CER
33
  type: cer
34
+ value: 56.86
35
  ---
36
 
37
+ # Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM
38
 
39
+ Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM is an automatic speech recognition model based on the [XLS-R](https://arxiv.org/abs/2111.09296) architecture. This model is a fine-tuned version of [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the `zh-HK` subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset. A 5-gram Language model, trained on multiple [PyCantonese](https://pycantonese.org/data.html) corpora, was then subsequently added to this model.
40
 
41
  This model was trained using HuggingFace's PyTorch framework and is part of the [Robust Speech Challenge Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614) organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH.
42
 
43
+ All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-lm-v2/tree/main) tab, as well as the [Training metrics](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-lm-v2/tensorboard) logged via Tensorboard.
44
+
45
+ As for the N-gram language model training, we followed the [blog post tutorial](https://huggingface.co/blog/wav2vec2-with-ngram) provided by HuggingFace.
46
 
47
  ## Model
48
 
49
+ | Model | #params | Arch. | Training/Validation data (text) |
50
+ | --------------------------------- | ------- | ----- | ------------------------------- |
51
+ | `wav2vec2-xls-r-300m-zh-HK-lm-v2` | 300M | XLS-R | `Common Voice zh-HK` Dataset |
52
 
53
  ## Evaluation Results
54
 
55
+ The model achieves the following results on evaluation without a language model:
56
+
57
+ | Dataset | CER |
58
+ | -------------------------------- | ------ |
59
+ | `Common Voice` | 31.73% |
60
+ | `Robust Speech Event - Dev Data` | 56.60% |
61
 
62
+ With the addition of the language model, it achieves the following results:
63
+
64
+ | Dataset | CER |
65
+ | -------------------------------- | ------ |
66
+ | `Common Voice` | 12.14% |
67
+ | `Robust Speech Event - Dev Data` | 56.86% |
68
 
69
  ## Training procedure
70
 
71
+ The training process did not involve the addition of a language model. The following results were simply lifted from the original automatic speech recognition [model training](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-korean).
72
+
73
  ### Training hyperparameters
74
 
75
  The following hyperparameters were used during training:
 
171
 
172
  ## Authors
173
 
174
+ Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on OVH Cloud.
175
 
176
  ## Framework versions
177