Toshiki Tomihira
commited on
Commit
•
a1e755c
1
Parent(s):
8ea065b
Update readme
Browse files
README.md
CHANGED
@@ -15,21 +15,25 @@ widget:
|
|
15 |
|
16 |
# Wav2Vec2-Base-960h
|
17 |
|
18 |
-
[Facebook
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
23 |
-
[Paper](https://arxiv.org/abs/2006.11477)
|
24 |
-
|
25 |
-
Authors: Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli
|
26 |
|
27 |
-
|
|
|
|
|
28 |
|
29 |
-
|
|
|
30 |
|
31 |
-
|
|
|
|
|
32 |
|
|
|
|
|
|
|
33 |
|
34 |
# Usage
|
35 |
|
@@ -109,4 +113,12 @@ print("WER:", wer(result["text"], result["transcription"]))
|
|
109 |
|
110 |
| "clean" | "other" |
|
111 |
|---|---|
|
112 |
-
| 3.4 | 8.6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
|
16 |
# Wav2Vec2-Base-960h
|
17 |
|
18 |
+
This repository is a reimplementation of [official Facebook’s wav2vec](https://huggingface.co/facebook/wav2vec2-base-960h).
|
19 |
+
There is no description of converting the wav2vec [pretrain model](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec#wav2vec-20) to a pytorch.bin file.
|
20 |
+
We are rebuilding pytorch.bin from the pretrain model.
|
21 |
+
Here is the conversion method.
|
|
|
|
|
|
|
|
|
22 |
|
23 |
+
```bash
|
24 |
+
pip install transformers[sentencepiece]
|
25 |
+
pip install fairseq -U
|
26 |
|
27 |
+
git clone https://github.com/huggingface/transformers.git
|
28 |
+
cp transformers/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py .
|
29 |
|
30 |
+
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_960h.pt -O ./finetuning/wav2vec_small_960h.pt
|
31 |
+
mkdir dict
|
32 |
+
wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt
|
33 |
|
34 |
+
mkdir outputs
|
35 |
+
python convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py --pytorch_dump_folder_path ./outputs --checkpoint_path ./wav2vec_small_960h.pt --dict_path ./dict
|
36 |
+
```
|
37 |
|
38 |
# Usage
|
39 |
|
|
|
113 |
|
114 |
| "clean" | "other" |
|
115 |
|---|---|
|
116 |
+
| 3.4 | 8.6 |
|
117 |
+
|
118 |
+
|
119 |
+
# Reference
|
120 |
+
|
121 |
+
|
122 |
+
[Facebook's Wav2Vec2](https://ai.facebook.com/blog/wav2vec-20-learning-the-structure-of-speech-from-raw-audio/)
|
123 |
+
[Facebook's huggingface Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h)
|
124 |
+
[Paper](https://arxiv.org/abs/2006.11477)
|