How To Use

Here is a plug and play inference code

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("eddiegulay/Whisperer_Mozilla_Sw_2000")
model = WhisperForConditionalGeneration.from_pretrained("eddiegulay/Whisperer_Mozilla_Sw_2000")
forced_decoder_ids = processor.get_decoder_prompt_ids(language="swahili", task="transcribe")

def transcribe(audio_path):
  # Load the audio file
  audio_input, sample_rate = torchaudio.load(audio_path)
  target_sample_rate = 16000
  audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input)

  # Preprocess the audio data
  input_features = processor(audio_input[0], sampling_rate=target_sample_rate, return_tensors="pt").input_features

  # generate token ids
  predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids)

  # Perform inference and transcribe
  transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)

  return transcription



transcribe('your_audio_file.mp3')
Downloads last month
11
Safetensors
Model size
764M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train eddiegulay/Whisperer_Mozilla_Sw_2000