w11wo
/

wav2vec2-xls-r-300m-zh-HK-v2

@@ -8,7 +8,7 @@ tags:
 datasets:
   - common_voice
 model-index:
-  - name: Wav2Vec2 XLS-R 300M Cantonese (zh-HK)
     results:
       - task:
           name: Automatic Speech Recognition
@@ -20,7 +20,7 @@ model-index:
         metrics:
           - name: Test CER
             type: cer
-            value: 31.73
       - task:
           name: Automatic Speech Recognition
           type: automatic-speech-recognition
@@ -31,34 +31,45 @@ model-index:
         metrics:
           - name: Test CER
             type: cer
-            value: 56.60
 ---
-# Wav2Vec2 XLS-R 300M Cantonese (zh-HK)
-Wav2Vec2 XLS-R 300M Cantonese (zh-HK) is an automatic speech recognition model based on the [XLS-R](https://arxiv.org/abs/2111.09296) architecture. This model is a fine-tuned version of [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the `zh-HK` subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset.
 This model was trained using HuggingFace's PyTorch framework and is part of the [Robust Speech Challenge Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614) organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH.
-All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-v2/tree/main) tab, as well as the [Training metrics](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-v2/tensorboard) logged via Tensorboard.
 ## Model
-| Model                          | #params | Arch. | Training/Validation data (text) |
-| ------------------------------ | ------- | ----- | ------------------------------- |
-| `wav2vec2-xls-r-300m-zh-HK-v2` | 300M    | XLS-R | `Common Voice zh-HK` Dataset    |
 ## Evaluation Results
-The model achieves the following results on evaluation:
-| Dataset                          | Loss   | CER    |
-| -------------------------------- | ------ | ------ |
-| `Common Voice`                   | 0.8089 | 31.73% |
-| `Robust Speech Event - Dev Data` | N/A    | 56.60% |
 ## Training procedure
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -160,7 +171,7 @@ Do consider the biases which came from pre-training datasets that may be carried
 ## Authors
-Wav2Vec2 XLS-R 300M Cantonese (zh-HK) was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on OVH Cloud.
 ## Framework versions

 datasets:
   - common_voice
 model-index:
+  - name: Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM
     results:
       - task:
           name: Automatic Speech Recognition
         metrics:
           - name: Test CER
             type: cer
+            value: 12.14
       - task:
           name: Automatic Speech Recognition
           type: automatic-speech-recognition
         metrics:
           - name: Test CER
             type: cer
+            value: 56.86
 ---
+# Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM
+Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM is an automatic speech recognition model based on the [XLS-R](https://arxiv.org/abs/2111.09296) architecture. This model is a fine-tuned version of [Wav2Vec2-XLS-R-300M](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the `zh-HK` subset of the [Common Voice](https://huggingface.co/datasets/common_voice) dataset. A 5-gram Language model, trained on multiple [PyCantonese](https://pycantonese.org/data.html) corpora, was then subsequently added to this model.
 This model was trained using HuggingFace's PyTorch framework and is part of the [Robust Speech Challenge Event](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614) organized by HuggingFace. All training was done on a Tesla V100, sponsored by OVH.
+All necessary scripts used for training could be found in the [Files and versions](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-lm-v2/tree/main) tab, as well as the [Training metrics](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-zh-HK-lm-v2/tensorboard) logged via Tensorboard.
+As for the N-gram language model training, we followed the [blog post tutorial](https://huggingface.co/blog/wav2vec2-with-ngram) provided by HuggingFace.
 ## Model
+| Model                             | #params | Arch. | Training/Validation data (text) |
+| --------------------------------- | ------- | ----- | ------------------------------- |
+| `wav2vec2-xls-r-300m-zh-HK-lm-v2` | 300M    | XLS-R | `Common Voice zh-HK` Dataset    |
 ## Evaluation Results
+The model achieves the following results on evaluation without a language model:
+| Dataset                          | CER    |
+| -------------------------------- | ------ |
+| `Common Voice`                   | 31.73% |
+| `Robust Speech Event - Dev Data` | 56.60% |
+With the addition of the language model, it achieves the following results:
+| Dataset                          | CER    |
+| -------------------------------- | ------ |
+| `Common Voice`                   | 12.14% |
+| `Robust Speech Event - Dev Data` | 56.86% |
 ## Training procedure
+The training process did not involve the addition of a language model. The following results were simply lifted from the original automatic speech recognition [model training](https://huggingface.co/w11wo/wav2vec2-xls-r-300m-korean).
 ### Training hyperparameters
 The following hyperparameters were used during training:
 ## Authors
+Wav2Vec2 XLS-R 300M Cantonese (zh-HK) LM was trained and evaluated by [Wilson Wongso](https://w11wo.github.io/). All computation and development are done on OVH Cloud.
 ## Framework versions