GRAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI

This model is fine-tuned on a carefully curated 13 hour dataset.

Evaluations - Word error rate

Test-Dataset openai-whisper-large-v3-turbo GRAG-WHISPER-LARGE-v3-TURBO primeline-whisper-large-v3-turbo-german
Tuda-De 8.195 6.360 6.441
common_voice_19_0 3.839 3.249 3.217
multilingual librispeech 3.202 2.071 2.067
All 3.641 2.633 2.630

The data and code for evaluations are available here

Training data

The training data for this model includes conversations of spoken German with a mix of english business phrases included. The data was carefully selected and processed to optimize recognition performance. The dataset will not be published because of unclear situation if the data would be used for voice-cloning. The rights to use the collected data are only for the intended use to train speech-to-text models.

How to use

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
from datasets import load_dataset
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
model_id = "avemio/GRAG-WHISPER-LARGE-v3-TURBO"
model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)
processor = AutoProcessor.from_pretrained(model_id)
pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)
dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
sample = dataset[0]["audio"]
result = pipe(sample)
print(result["text"])

Framework versions

  • Transformers 4.47.1
  • Pytorch 2.5.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
24
Safetensors
Model size
809M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for avemio/GRAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI

Finetuned
(168)
this model

Space using avemio/GRAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI 1

Collection including avemio/GRAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI