nguyenvulebinh commited on
Commit
f121903
·
1 Parent(s): 1b7c81a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -4
README.md CHANGED
@@ -24,7 +24,7 @@ widget:
24
 
25
  [Our models](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h) are pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of [VLSP ASR dataset](https://vlsp.org.vn/vlsp2020/eval/asr) on 16kHz sampled speech audio.
26
 
27
- We use wav2vec2 architecture for the pre-trained model. Follow wav2vec2 paper:
28
 
29
  >For the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
30
 
@@ -36,6 +36,7 @@ For fine-tuning phase, wav2vec2 is fine-tuned using Connectionist Temporal Class
36
 
37
  In a formal ASR system, two components are required: acoustic model and language model. Here ctc-wav2vec fine-tuned model works as an acoustic model. For the language model, we provide a [4-grams model](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h/blob/main/vi_lm_4grams.bin.zip) trained on 2GB of spoken text.
38
 
 
39
 
40
  ### Benchmark WER result:
41
 
@@ -84,11 +85,28 @@ predicted_ids = torch.argmax(logits, dim=-1)
84
  transcription = processor.batch_decode(predicted_ids)
85
  ```
86
 
87
- # License
88
 
89
- This model follows [CC-BY-NC-4.0](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h/raw/main/CC-BY-NC-SA-4.0.txt) license. Therefore, those compounds are freely available for academic purposes or individual research but restricted for commercial use.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
90
 
91
  # Contact
92
 
93
 
94
  [![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)
 
24
 
25
  [Our models](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h) are pre-trained on 13k hours of Vietnamese youtube audio (un-label data) and fine-tuned on 250 hours labeled of [VLSP ASR dataset](https://vlsp.org.vn/vlsp2020/eval/asr) on 16kHz sampled speech audio.
26
 
27
+ We use [wav2vec2 architecture](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/) for the pre-trained model. Follow wav2vec2 paper:
28
 
29
  >For the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.
30
 
 
36
 
37
  In a formal ASR system, two components are required: acoustic model and language model. Here ctc-wav2vec fine-tuned model works as an acoustic model. For the language model, we provide a [4-grams model](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h/blob/main/vi_lm_4grams.bin.zip) trained on 2GB of spoken text.
38
 
39
+ Detail of training and fine-tuning process, the audience can follow [fairseq github](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec) and [huggingface blog](https://huggingface.co/blog/fine-tune-wav2vec2-english).
40
 
41
  ### Benchmark WER result:
42
 
 
85
  transcription = processor.batch_decode(predicted_ids)
86
  ```
87
 
88
+ ### Model Parameters License
89
 
90
+ The ASR model parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode
91
+
92
+ ### Citation
93
+
94
+ [![CITE](https://zenodo.org/badge/DOI/10.5281/zenodo.5356039.svg)](https://github.com/vietai/ASR)
95
+
96
+ ```text
97
+ @misc{Thai_Binh_Nguyen_wav2vec2_vi_2021,
98
+ author = {Thai Binh Nguyen},
99
+ doi = {10.5281/zenodo.5356039},
100
+ month = {09},
101
+ title = {{Vietnamese end-to-end speech recognition using wav2vec 2.0}},
102
+ url = {https://github.com/vietai/ASR},
103
+ year = {2021}
104
+ }
105
+ ```
106
+ **Please CITE** our repo when it is used to help produce published results or is incorporated into other software.
107
 
108
  # Contact
109
 
110
111
+
112
  [![Follow](https://img.shields.io/twitter/follow/nguyenvulebinh?style=social)](https://twitter.com/intent/follow?screen_name=nguyenvulebinh)