Update README.md
Browse files
README.md
CHANGED
@@ -134,75 +134,19 @@ It is a 1550M parameters multi-lingual ASR solution.
|
|
134 |
To transcribe audio samples, the model has to be used alongside a [`WhisperProcessor`](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperProcessor).
|
135 |
|
136 |
```python
|
137 |
-
import
|
138 |
-
from transformers import WhisperProcessor, WhisperForConditionalGeneration
|
139 |
|
140 |
-
|
141 |
|
142 |
-
|
143 |
-
|
144 |
-
|
145 |
-
model = WhisperForConditionalGeneration.from_pretrained(model_path)
|
146 |
-
if has_cuda:
|
147 |
-
model.to('cuda:0')
|
148 |
-
|
149 |
-
processor = WhisperProcessor.from_pretrained(model_path)
|
150 |
-
|
151 |
-
# audio_resample based on entry being part of an existing dataset.
|
152 |
-
# Alternatively, this can be loaded from an audio file.
|
153 |
-
audio_resample = librosa.resample(entry['audio']['array'], orig_sr=entry['audio']['sampling_rate'], target_sr=SAMPLING_RATE)
|
154 |
-
|
155 |
-
input_features = processor(audio_resample, sampling_rate=SAMPLING_RATE, return_tensors="pt").input_features
|
156 |
-
if has_cuda:
|
157 |
-
input_features = input_features.to('cuda:0')
|
158 |
-
|
159 |
-
predicted_ids = model.generate(input_features, language='he', num_beams=5)
|
160 |
-
transcript = processor.batch_decode(predicted_ids, skip_special_tokens=True)
|
161 |
-
|
162 |
-
print(f'Transcript: {transcription[0]}')
|
163 |
```
|
164 |
|
165 |
## Evaluation
|
166 |
|
167 |
You can use the [evaluate_model.py](https://github.com/yairl/ivrit.ai/blob/master/evaluate_model.py) reference on GitHub to evalute the model's quality.
|
168 |
|
169 |
-
## Long-Form Transcription
|
170 |
-
|
171 |
-
The Whisper model is intrinsically designed to work on audio samples of up to 30s in duration. However, by using a chunking
|
172 |
-
algorithm, it can be used to transcribe audio samples of up to arbitrary length. This is possible through Transformers
|
173 |
-
[`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline)
|
174 |
-
method. Chunking is enabled by setting `chunk_length_s=30` when instantiating the pipeline. With chunking enabled, the pipeline
|
175 |
-
can be run with batched inference. It can also be extended to predict sequence level timestamps by passing `return_timestamps=True`:
|
176 |
-
|
177 |
-
```python
|
178 |
-
>>> import torch
|
179 |
-
>>> from transformers import pipeline
|
180 |
-
>>> from datasets import load_dataset
|
181 |
-
|
182 |
-
>>> device = "cuda:0" if torch.cuda.is_available() else "cpu"
|
183 |
-
|
184 |
-
>>> pipe = pipeline(
|
185 |
-
>>> "automatic-speech-recognition",
|
186 |
-
>>> model="ivrit-ai/whisper-large-v2-tuned",
|
187 |
-
>>> chunk_length_s=30,
|
188 |
-
>>> device=device,
|
189 |
-
>>> )
|
190 |
-
|
191 |
-
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
|
192 |
-
>>> sample = ds[0]["audio"]
|
193 |
-
|
194 |
-
>>> prediction = pipe(sample.copy(), batch_size=8)["text"]
|
195 |
-
" Mr. Quilter is the apostle of the middle classes, and we are glad to welcome his gospel."
|
196 |
-
|
197 |
-
>>> # we can also return timestamps for the predictions
|
198 |
-
>>> prediction = pipe(sample.copy(), batch_size=8, return_timestamps=True)["chunks"]
|
199 |
-
[{'text': ' Mr. Quilter is the apostle of the middle classes and we are glad to welcome his gospel.',
|
200 |
-
'timestamp': (0.0, 5.44)}]
|
201 |
-
```
|
202 |
-
|
203 |
-
Refer to the blog post [ASR Chunking](https://huggingface.co/blog/asr-chunking) for more details on the chunking algorithm.
|
204 |
-
|
205 |
-
|
206 |
|
207 |
### BibTeX entry and citation info
|
208 |
|
|
|
134 |
To transcribe audio samples, the model has to be used alongside a [`WhisperProcessor`](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperProcessor).
|
135 |
|
136 |
```python
|
137 |
+
from faster_whisper import WhisperModel
|
|
|
138 |
|
139 |
+
model = WhisperModel("sivan22/faster-whisper-ivrit-ai-whisper-large-v2-tuned")
|
140 |
|
141 |
+
segments, info = model.transcribe("audio.mp3")
|
142 |
+
for segment in segments:
|
143 |
+
print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
144 |
```
|
145 |
|
146 |
## Evaluation
|
147 |
|
148 |
You can use the [evaluate_model.py](https://github.com/yairl/ivrit.ai/blob/master/evaluate_model.py) reference on GitHub to evalute the model's quality.
|
149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
150 |
|
151 |
### BibTeX entry and citation info
|
152 |
|