Whisper small adapters model for Greek transcription
We added adapters to whisper-small model then we finetuned it on Greek ASR. During training, the model is frozen and only the adapters are being trained. When trying to transcribe Greek, we need to activate the adapters, otherwise we can ignore the adapters and use the original whisper model.
How to use
Start by installing transformers with Whisper model with added adapters
git clone https://gitlab.com/horizon-europe-voxreality/multilingual-translation/speech-translation-demo.git
cd speech-translation-demo
# You might need to switch to dev branch
pip install -e transformers
The parameter use_adapters
is used to decide whether we will use the adapters or not. It needs to be set to True only in the case of Greek.
from transformers import WhisperProcessor, WhisperForConditionalGenerationWithAdapters
from datasets import Audio, load_dataset
# load model and processor
processor = WhisperProcessor.from_pretrained("voxreality/whisper-small-el-adapters")
model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="greek", task="transcribe")
# load streaming dataset and read first audio sample
ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]
input_features = processor(input_speech["array"], sampling_rate=input_speech["sampling_rate"], return_tensors="pt").input_features
# Set use_adapters to False for languages other than Greek.
# generate token ids
predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids, use_adapters=True)
# decode token ids to text
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
You can also use an HF pipeline:
from transformers import pipeline
from datasets import Audio, load_dataset
ds = load_dataset("mozilla-foundation/common_voice_11_0", "el", split="test", streaming=True)
ds = ds.cast_column("audio", Audio(sampling_rate=16_000))
input_speech = next(iter(ds))["audio"]
model = WhisperForConditionalGenerationWithAdapters.from_pretrained("voxreality/whisper-small-el-adapters")
pipe = pipeline("automatic-speech-recognition", model=model, tokenizer="voxreality/whisper-small-el-adapters",
"voxreality/whisper-small-el-adapters", device='cpu', batch_size=32)
transcription = pipe(input_speech['array'], generate_kwargs = {"language":f"<|el|>","task": "transcribe", "use_adapters": True})
- Downloads last month
- 17