DewiBrynJones commited on
Commit
7728ebf
1 Parent(s): 7460a75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -8
README.md CHANGED
@@ -26,20 +26,112 @@ model-index:
26
  value: 28.33
27
  ---
28
 
29
- # Wav2Vec2-Large-XLSR-53-Welsh
30
 
31
- Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) in Welsh using the [Common Voice](https://huggingface.co/datasets/common_voice) dataset.
32
 
33
- The code for training, using and evaluating this model can be found at GitHub: https://github.com/techiaith/xlsr-fine-tuning-week
34
 
35
- With a WER of 28.33%, here are some example predictions from the Common Voice Welsh test set:
36
 
37
- **Prediction:** rhedais i ffwrdd heb ddweud dim wrthi ym beth digwyddodd
38
 
39
- **Reference:** Rhedais i ffwrdd heb ddweud dim wrthi am beth ddigwyddodd.
40
 
 
 
 
 
 
41
 
42
- **Prediction:** ac yr oedd y ferch yn ofnus d
43
 
44
- **Reference:** Ac yr oedd y ferch yn ofnus.
 
45
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  value: 28.33
27
  ---
28
 
29
+ # Wav2Vec2-Large-XLSR-Welsh
30
 
31
+ Fine-tuned [facebook/wav2vec2-large-xlsr-53](https://huggingface.co/facebook/wav2vec2-large-xlsr-53) on Welsh using the [Common Voice dataset](https://huggingface.co/datasets/common_voice).
32
 
33
+ When using this model, make sure that your speech input is sampled at 16kHz.
34
 
 
35
 
36
+ ## Usage
37
 
38
+ The model can be used directly (without a language model) as follows:
39
 
40
+ ```python
41
+ import torch
42
+ import torchaudio
43
+ from datasets import load_dataset
44
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
45
 
46
+ test_dataset = load_dataset("common_voice", "cy", split="test[:2%]")
47
 
48
+ processor = Wav2Vec2Processor.from_pretrained("DewiBrynJones/wav2vec2-large-xlsr-welsh")
49
+ model = Wav2Vec2ForCTC.from_pretrained("DewiBrynJones/wav2vec2-large-xlsr-welsh")
50
 
51
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
52
+
53
+ # Preprocessing the datasets.
54
+ # We need to read the aduio files as arrays
55
+ def speech_file_to_array_fn(batch):
56
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
57
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
58
+ return batch
59
+
60
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
61
+ inputs = processor(test_dataset["speech"][:2], sampling_rate=16_000, return_tensors="pt", padding=True)
62
+
63
+ with torch.no_grad():
64
+ logits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits
65
+
66
+ predicted_ids = torch.argmax(logits, dim=-1)
67
+
68
+ print("Prediction:", processor.batch_decode(predicted_ids))
69
+ print("Reference:", test_dataset["sentence"][:2])
70
+ ```
71
+
72
+
73
+ ## Evaluation
74
+
75
+ The model can be evaluated as follows on the Welsh test data of Common Voice.
76
+
77
+
78
+ ```python
79
+ import torch
80
+ import torchaudio
81
+ from datasets import load_dataset, load_metric
82
+ from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
83
+ import re
84
+
85
+ test_dataset = load_dataset("common_voice", "cy", split="test")
86
+
87
+ wer = load_metric("wer")
88
+
89
+ processor = Wav2Vec2Processor.from_pretrained("DewiBrynJones/wav2vec2-large-xlsr-welsh")
90
+ model = Wav2Vec2ForCTC.from_pretrained("DewiBrynJones/wav2vec2-large-xlsr-welsh")
91
+
92
+ model.to("cuda")
93
+
94
+ chars_to_ignore_regex = '[\,\?\.\!\-\;\:\"\“]'
95
+
96
+ resampler = torchaudio.transforms.Resample(48_000, 16_000)
97
+
98
+ # Preprocessing the datasets.
99
+ # We need to read the aduio files as arrays
100
+ def speech_file_to_array_fn(batch):
101
+ batch["sentence"] = re.sub(chars_to_ignore_regex, '', batch["sentence"]).lower()
102
+ speech_array, sampling_rate = torchaudio.load(batch["path"])
103
+ batch["speech"] = resampler(speech_array).squeeze().numpy()
104
+ return batch
105
+
106
+ test_dataset = test_dataset.map(speech_file_to_array_fn)
107
+
108
+ # Preprocessing the datasets.
109
+ # We need to read the aduio files as arrays
110
+ def evaluate(batch):
111
+ inputs = processor(batch["speech"], sampling_rate=16_000, return_tensors="pt", padding=True)
112
+
113
+ with torch.no_grad():
114
+ logits = model(inputs.input_values.to("cuda"), attention_mask=inputs.attention_mask.to("cuda")).logits
115
+
116
+ pred_ids = torch.argmax(logits, dim=-1)
117
+ batch["pred_strings"] = processor.batch_decode(pred_ids)
118
+ return batch
119
+
120
+ result = test_dataset.map(evaluate, batched=True, batch_size=8)
121
+
122
+ print("WER: {:2f}".format(100 * wer.compute(predictions=result["pred_strings"], references=result["sentence"])))
123
+ ```
124
+
125
+ **Test Result**: 28.33%
126
+
127
+
128
+ # Training
129
+
130
+ A Docker based setup for training and evaluating this model can be found at GitHub: https://github.com/techiaith/xlsr-fine-tuning-week
131
+
132
+ # Example Predictions
133
+
134
+ | Prediction | Reference |
135
+ |---|---|
136
+ | rhedais i ffwrdd heb ddweud dim wrthi ym beth digwyddodd | Rhedais i ffwrdd heb ddweud dim wrthi am beth ddigwyddodd. |
137
+ | ac yr oedd y ferch yn ofnus d | Ac yr oedd y ferch yn ofnus. |