nguyenvulebinh
commited on
Commit
·
f121903
1
Parent(s):
1b7c81a
Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ widget:
|
|
24 |
|
25 |
[Our models](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h) are pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of [VLSP ASR dataset](https://vlsp.org.vn/vlsp2020/eval/asr) on 16kHz sampled speech audio.
|
26 |
|
27 |
-
We use wav2vec2 architecture for the pre-trained model. Follow wav2vec2 paper:
|
28 |
|
29 |
>For the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
|
30 |
|
@@ -36,6 +36,7 @@ For fine-tuning phase, wav2vec2 is fine-tuned using Connectionist Temporal Class
|
|
36 |
|
37 |
In a formal ASR system, two components are required: acoustic model and language model. Here ctc-wav2vec fine-tuned model works as an acoustic model. For the language model, we provide a [4-grams model](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h/blob/main/vi_lm_4grams.bin.zip) trained on 2GB of spoken text.
|
38 |
|
|
|
39 |
|
40 |
### Benchmark WER result:
|
41 |
|
@@ -84,11 +85,28 @@ predicted_ids = torch.argmax(logits, dim=-1)
|
|
84 |
transcription = processor.batch_decode(predicted_ids)
|
85 |
```
|
86 |
|
87 |
-
|
88 |
|
89 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
90 |
|
91 |
# Contact
|
92 |
|
93 | |
|
|
94 |
[![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)
|
|
|
24 |
|
25 |
[Our models](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h) are pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of [VLSP ASR dataset](https://vlsp.org.vn/vlsp2020/eval/asr) on 16kHz sampled speech audio.
|
26 |
|
27 |
+
We use [wav2vec2 architecture](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) for the pre-trained model. Follow wav2vec2 paper:
|
28 |
|
29 |
>For the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
|
30 |
|
|
|
36 |
|
37 |
In a formal ASR system, two components are required: acoustic model and language model. Here ctc-wav2vec fine-tuned model works as an acoustic model. For the language model, we provide a [4-grams model](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h/blob/main/vi_lm_4grams.bin.zip) trained on 2GB of spoken text.
|
38 |
|
39 |
+
Detail of training and fine-tuning process, the audience can follow [fairseq github](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) and [huggingface blog](https://huggingface.co/blog/fine-tune-wav2vec2-english).
|
40 |
|
41 |
### Benchmark WER result:
|
42 |
|
|
|
85 |
transcription = processor.batch_decode(predicted_ids)
|
86 |
```
|
87 |
|
88 |
+
### Model Parameters License
|
89 |
|
90 |
+
The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode
|
91 |
+
|
92 |
+
### Citation
|
93 |
+
|
94 |
+
[![CITE](https://zenodo.org/badge/DOI/10.5281/zenodo.5356039.svg)](https://github.com/vietai/ASR)
|
95 |
+
|
96 |
+
```text
|
97 |
+
@misc{Thai_Binh_Nguyen_wav2vec2_vi_2021,
|
98 |
+
author = {Thai Binh Nguyen},
|
99 |
+
doi = {10.5281/zenodo.5356039},
|
100 |
+
month = {09},
|
101 |
+
title = {{Vietnamese end-to-end speech recognition using wav2vec 2.0}},
|
102 |
+
url = {https://github.com/vietai/ASR},
|
103 |
+
year = {2021}
|
104 |
+
}
|
105 |
+
```
|
106 |
+
**Please CITE** our repo when it is used to help produce published results or is incorporated into other software.
|
107 |
|
108 |
# Contact
|
109 |
|
110 | |
111 |
+
|
112 |
[![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)
|