abdouaziiz
/

wav2vec2-xls-r-300m-wolof-lm

Automatic Speech Recognition

Inference Endpoints

Model card Files Files and versions Community

abdouaziiz commited on Dec 19, 2021

Commit

c542d41

•

1 Parent(s): 3f7fb84

Update README.md

Files changed (1) hide show

README.md +64 -4

README.md CHANGED Viewed

@@ -76,7 +76,67 @@ The following hyperparameters were used during training:
 | 27000 | 0.084400 |	0.367826 |	0.212565 |
-### Framework versions
-- Transformers 4.11
-- Pytorch 1.10.0
-- Datasets 1.13

 | 27000 | 0.084400 |	0.367826 |	0.212565 |
+## Usage
+The model can be used directly (without a language model) as follows:
+```python
+import librosa
+import warnings
+from transformers import AutoProcessor, AutoModelForCTC
+from datasets import Dataset, DatasetDict
+from datasets import load_metric
+wer_metric = load_metric("wer")
+wolof = pd.read_csv('Test.csv') # wolof contains the columns of file , and transcription
+wolof = DatasetDict({'test': Dataset.from_pandas(wolof)})
+chars_to_ignore_regex = '[\"\?\.\!\-\;\:\(\)\,]'
+def remove_special_characters(batch):
+    batch["transcription"] = re.sub(chars_to_ignore_regex, '', batch["transcription"]).lower() + " "
+    return batch
+wolof = wolof.map(remove_special_characters)
+processor = AutoProcessor.from_pretrained("abdouaziiz/wav2vec2-xls-r-300m-wolof")
+model = AutoModelForCTC.from_pretrained("abdouaziiz/wav2vec2-xls-r-300m-wolof")
+warnings.filterwarnings("ignore")
+def speech_file_to_array_fn(batch):
+    speech_array, sampling_rate = librosa.load(batch["file"], sr = 16000)
+    batch["speech"] = speech_array.astype('float16')
+    batch["sampling_rate"] = sampling_rate
+    batch["target_text"] = batch["transcription"]
+    return batch
+wolof = wolof.map(speech_file_to_array_fn, remove_columns=wolof.column_names["test"], num_proc=1)
+def map_to_result(batch):
+    model.to("cuda")
+    input_values = processor(
+      batch["speech"],
+      sampling_rate=batch["sampling_rate"],
+      return_tensors="pt"
+    ).input_values.to("cuda")
+    with torch.no_grad():
+        logits = model(input_values).logits
+        pred_ids = torch.argmax(logits, dim=-1)
+        batch["pred_str"] = processor.batch_decode(pred_ids)[0]
+    return batch
+ results = wolof["test"].map(map_to_result)
+ print("Test WER: {:.3f}".format(wer_metric.compute(predictions=results["pred_str"], references=results["transcription"])))
+```
+## PS:
+The results obtained can be improved by using :
+- Wav2vec2 + language model .
+- Build a Spellcheker from the text of the data
+- Sentence Edit Distance