gigant commited on
Commit
1cd8351
·
1 Parent(s): 4140359

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -70,6 +70,87 @@ The architecture is based on [facebook/wav2vec2-xls-r-300m](https://huggingface.
70
 
71
  More information needed
72
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
  ## Training and evaluation data
74
 
75
  Training data :
 
70
 
71
  More information needed
72
 
73
+ ## How to use
74
+
75
+ Make sure you have installed the correct dependencies for the language model-boosted version to work. You can just run this command to install the `kenlm` and `pyctcdecode` libraries :
76
+
77
+ ```pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode```
78
+
79
+
80
+ With the framework `transformers` you can load the model with the following code :
81
+
82
+ ```
83
+ from transformers import AutoProcessor, AutoModelForCTC
84
+
85
+ processor = AutoProcessor.from_pretrained("gigant/romanian-wav2vec2")
86
+
87
+ model = AutoModelForCTC.from_pretrained("gigant/romanian-wav2vec2")
88
+ ```
89
+
90
+ Or, if you want to test the model, you can load the automatic speech recognition pipeline from `transformers` with :
91
+
92
+ ```
93
+ from transformers import pipeline
94
+
95
+ asr = pipeline("automatic-speech-recognition", model="gigant/romanian-wav2vec2")
96
+ ```
97
+
98
+
99
+ ## Example use with the `datasets` library
100
+
101
+ First, you need to load your data
102
+
103
+ We will use the [Romanian Speech Synthesis](https://huggingface.co/datasets/gigant/romanian_speech_synthesis_0_8_1) dataset in this example.
104
+
105
+ ```
106
+ from datasets import load_dataset
107
+
108
+ dataset = load_dataset("gigant/romanian_speech_synthesis_0_8_1")
109
+ ```
110
+
111
+ You can listen to the samples with the `IPython.display` library :
112
+
113
+ ```
114
+ from IPython.display import Audio
115
+
116
+ i = 0
117
+ sample = dataset["train"][i]
118
+ Audio(sample["audio"]["array"], rate = sample["audio"]["sampling_rate"])
119
+ ```
120
+
121
+ The model is trained to work with audio sampled at 16kHz, so if the sampling rate of the audio in the dataset is different, we will have to resample it.
122
+
123
+ In the example, the audio is sampled at 48kHz. We can see this by checking `dataset["train"][0]["audio"]["sampling_rate"]`
124
+
125
+ The following code resample the audio using the `torchaudio` library :
126
+
127
+ ```
128
+ import torchaudio
129
+ import torch
130
+
131
+ i = 0
132
+ audio = sample["audio"]["array"]
133
+ rate = sample["audio"]["sampling_rate"]
134
+ resampler = torchaudio.transforms.Resample(rate, 16_000)
135
+ audio_16 = resampler(torch.Tensor(audio)).numpy()
136
+ ```
137
+
138
+ To listen to the resampled sample :
139
+
140
+ ```
141
+ Audio(audio_16, rate=16000)
142
+ ```
143
+
144
+ Know you can get the model prediction by running
145
+
146
+ ```
147
+ predicted_text = asr(audio_16)
148
+ ground_truth = dataset["train"][i]["sentence"]
149
+
150
+ print(f"Predicted text : {predicted_text}")
151
+ print(f"Ground truth : {ground_truth}")
152
+ ```
153
+
154
  ## Training and evaluation data
155
 
156
  Training data :