|
--- |
|
license: apache-2.0 |
|
--- |
|
|
|
# faster-whisper-large-v3 |
|
|
|
This is the model Whisper large-v3 converted to be used in [faster-whisper](https://github.com/guillaumekln/faster-whisper). |
|
|
|
## Using |
|
|
|
You can choose between monkey-patching faster-whisper 0.9.0 (while they don't update it) or using my fork (which is |
|
easier). |
|
|
|
|
|
### Using my fork |
|
|
|
First, install it by executing: |
|
|
|
```shell |
|
pip install -U 'transformers[torch]>=4.35.0' https://github.com/PythonicCafe/faster-whisper/archive/refs/heads/feature/large-v3.zip#egg=faster-whisper |
|
``` |
|
|
|
Then, use it as the regular faster-whisper: |
|
|
|
```python |
|
import time |
|
|
|
import faster_whisper |
|
|
|
|
|
filename = "my-audio.mp3" |
|
initial_prompt = "My podcast recording" # Or `None` |
|
word_timestamps = False |
|
vad_filter = True |
|
temperature = 0.0 |
|
language = "pt" |
|
model_size = "large-v3" |
|
device, compute_type = "cuda", "float16" |
|
# or: device, compute_type = "cpu", "float32" |
|
|
|
model = faster_whisper.WhisperModel(model_size, device=device, compute_type=compute_type) |
|
|
|
segments, transcription_info = model.transcribe( |
|
filename, |
|
word_timestamps=word_timestamps, |
|
vad_filter=vad_filter, |
|
temperature=temperature, |
|
language=language, |
|
initial_prompt=initial_prompt, |
|
) |
|
print(transcription_info) |
|
|
|
start_time = time.time() |
|
for segment in segments: |
|
row = { |
|
"start": segment.start, |
|
"end": segment.end, |
|
"text": segment.text, |
|
} |
|
if word_timestamps: |
|
row["words"] = [ |
|
{"start": word.start, "end": word.end, "word": word.word} |
|
for word in segment.words |
|
] |
|
print(row) |
|
end_time = time.time() |
|
print(f"Transcription finished in {end_time - start_time:.2f}s") |
|
``` |
|
|
|
|
|
### Monkey-patching faster-whisper 0.9.0 |
|
|
|
Make sure you have the latest version: |
|
|
|
```shell |
|
pip install -U 'faster-whisper>=0.9.0' |
|
``` |
|
|
|
Then, use it with some little changes: |
|
|
|
```python |
|
import time |
|
|
|
import faster_whisper.transcribe |
|
|
|
|
|
# Monkey patch 1 (add model to list) |
|
faster_whisper.utils._MODELS["large-v3"] = "turicas/faster-whisper-large-v3" |
|
|
|
# Monkey patch 2 (fix Tokenizer) |
|
faster_whisper.transcribe.Tokenizer.encode = lambda self, text: self.tokenizer.encode(text, add_special_tokens=False) |
|
|
|
filename = "my-audio.mp3" |
|
initial_prompt = "My podcast recording" # Or `None` |
|
word_timestamps = False |
|
vad_filter = True |
|
temperature = 0.0 |
|
language = "pt" |
|
model_size = "large-v3" |
|
device, compute_type = "cuda", "float16" |
|
# or: device, compute_type = "cpu", "float32" |
|
|
|
model = faster_whisper.transcribe.WhisperModel(model_size, device=device, compute_type=compute_type) |
|
|
|
# Monkey patch 3 (change n_mels) |
|
from faster_whisper.feature_extractor import FeatureExtractor |
|
model.feature_extractor = FeatureExtractor(feature_size=128) |
|
|
|
# Monkey patch 4 (change tokenizer) |
|
from transformers import AutoProcessor |
|
model.hf_tokenizer = AutoProcessor.from_pretrained("openai/whisper-large-v3").tokenizer |
|
model.hf_tokenizer.token_to_id = lambda token: model.hf_tokenizer.convert_tokens_to_ids(token) |
|
|
|
segments, transcription_info = model.transcribe( |
|
filename, |
|
word_timestamps=word_timestamps, |
|
vad_filter=vad_filter, |
|
temperature=temperature, |
|
language=language, |
|
initial_prompt=initial_prompt, |
|
) |
|
print(transcription_info) |
|
|
|
start_time = time.time() |
|
for segment in segments: |
|
row = { |
|
"start": segment.start, |
|
"end": segment.end, |
|
"text": segment.text, |
|
} |
|
if word_timestamps: |
|
row["words"] = [ |
|
{"start": word.start, "end": word.end, "word": word.word} |
|
for word in segment.words |
|
] |
|
print(row) |
|
end_time = time.time() |
|
print(f"Transcription finished in {end_time - start_time:.2f}s") |
|
``` |
|
|
|
## Converting |
|
|
|
If you'd like to convert the model yourself, execute: |
|
|
|
```shell |
|
pip install -U 'ctranslate2>=3.21.0' 'transformers-4.35.0' 'OpenNMT-py==2.*' sentencepiece |
|
ct2-transformers-converter --model openai/whisper-large-v3 --output_dir whisper-large-v3-ct2 |
|
``` |
|
|
|
Then, the files will be at `whisper-large-v3-ct2/`. |
|
|
|
|
|
## License |
|
|
|
These files have the same license as the original [openai/whisper-large-v3 |
|
model](https://huggingface.co/openai/whisper-large): Apache 2.0. |
|
|