avemio
/

German-RAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Community

German-RAG-WHISPER-LARGE-v3-TURBO-HESSIAN-AI / README.md

avemio-digital's picture

Update README.md

7a18a1b verified about 2 months ago

|

3.54 kB

	---
	library_name: transformers
	language:
	- de
	license: mit
	base_model: openai/whisper-large-v3-turbo
	tags:
	- generated_from_trainer
	datasets:
	- MR-Eder/GER-TTS-50-Conversations
	metrics:
	- wer
	model-index:
	- name: Whisper Large -v3 Turbo German - GRAG
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: GER-TTS-50-Conversations
	type: MR-Eder/GER-TTS-50-Conversations
	config: default
	split: None
	args: 'config: de, split: test'
	metrics:
	- name: Wer
	type: wer
	value: 15.16864233785768
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# GRAG-WHISPER-LARGE-v3-TURBO

	This model is a fine-tuned version of [openai/whisper-large-v3-turbo](https://huggingface.co/openai/whisper-large-v3-turbo) on a carefully curated 13 hour dataset.


	## Evaluations - Word error rate

	\| Test-Dataset \| openai-whisper-large-v3-turbo \| GRAG-WHISPER-LARGE-v3-TURBO \| primeline-whisper-large-v3-turbo-german \|
	\|-------------------------------------\|-------------------------------\|-------------------------\|-----------------------------------\|
	\| Tuda-De \| 8.195 \| 6.360 \| 6.441 \|
	\| common_voice_19_0 \| 3.839 \| 3.249 \| 3.217 \|
	\| multilingual librispeech \| 3.202 \| 2.071 \| 2.067 \|
	\| All \| 3.641 \| 2.633 \| 2.630 \|

	The data and code for evaluations are available [here](https://huggingface.co/datasets/avemio/ASR-GERMAN-MIXED-EVALS-GRAG)

	### Training data
	The training data for this model includes conversations of spoken German with a mix of english business phrases included. The data was carefully selected and processed to optimize recognition performance. The dataset will not be published because of unclear situation if the data would be used for voice-cloning. The rights to use the collected data are only for the intended use to train speech-to-text models.

	### How to use

	```python
	import torch
	from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
	from datasets import load_dataset
	device = "cuda:0" if torch.cuda.is_available() else "cpu"
	torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
	model_id = "avemio/GRAG-WHISPER-LARGE-v3-TURBO"
	model = AutoModelForSpeechSeq2Seq.from_pretrained(
	model_id, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
	)
	model.to(device)
	processor = AutoProcessor.from_pretrained(model_id)
	pipe = pipeline(
	"automatic-speech-recognition",
	model=model,
	tokenizer=processor.tokenizer,
	feature_extractor=processor.feature_extractor,
	max_new_tokens=128,
	chunk_length_s=30,
	batch_size=16,
	return_timestamps=True,
	torch_dtype=torch_dtype,
	device=device,
	)
	dataset = load_dataset("distil-whisper/librispeech_long", "clean", split="validation")
	sample = dataset[0]["audio"]
	result = pipe(sample)
	print(result["text"])
	```


	### Framework versions

	- Transformers 4.47.1
	- Pytorch 2.5.1+cu121
	- Datasets 3.2.0
	- Tokenizers 0.21.0