rukaiyah-indika-ai
/

iVaani

Automatic Speech Recognition

Generated from Trainer

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

iVaani / README.md

rukaiyah-indika-ai's picture

rukaiyah-indika-ai

Update README.md

c0636cd about 1 year ago

|

3.74 kB

	---
	language:
	- hi
	license: apache-2.0
	base_model: openai/whisper-medium
	tags:
	- whisper-event
	- generated_from_trainer
	datasets:
	- mozilla-foundation/common_voice_11_0
	metrics:
	- wer
	model-index:
	- name: Whisper Medium finetuned Hindi
	results:
	- task:
	name: Automatic Speech Recognition
	type: automatic-speech-recognition
	dataset:
	name: common_voice_11_0
	type: mozilla-foundation/common_voice_11_0
	config: hi
	split: test
	args: hi
	metrics:
	- name: Wer
	type: wer
	value: 99.8077099166743
	---

	# Fine-tuned Whisper Medium for Hindi Language

	# Model Description
	This model is a fine-tuned version of OpenAI's Whisper medium model, specifically optimized for the Hindi language. The fine-tuning process has led to an improvement in accuracy by 2.5% compared to the original Whisper model.

	# Performance
	After fine-tuning, the model shows a 2.5% increase in transcription accuracy for Hindi language audio compared to the base Whisper medium model.

	# How to Use
	You can use this model directly with a simple API call in Hugging Face. Here is a Python code snippet for using the model:

	```python
	from transformers import AutoModelForCTC, Wav2Vec2Processor

	model = AutoModelForCTC.from_pretrained("rukaiyah-indika-ai/whisper-medium-hindi-fine-tuned")
	processor = Wav2Vec2Processor.from_pretrained("rukaiyah-indika-ai/whisper-medium-hindi-fine-tuned")

	# Replace 'path_to_audio_file' with the path to your Hindi audio file
	input_audio = processor(path_to_audio_file, return_tensors="pt", padding=True)

	# Perform the transcription
	transcription = model.generate(**input_audio)
	print("Transcription:", transcription)
	```

	# Additional Language Models
	Indika AI has also fine-tuned ASR (Automatic Speech Recognition) models for several other Indic languages,
	enhancing the accuracy by 2-5% for each language. The word error rate has also been significantly reduced.

	The additional languages include:

	\| Language \| Original Accuracy \| Accuracy Improvement \| Word Error Rate Reduction\|
	\|------------\|-------------------\|----------------------\|--------------------------\|
	\| Bengali \| 88% \| +3.5% \| -18% \|
	\| Telugu \| 86% \| +2.8% \| -15% \|
	\| Marathi \| 87% \| +4.2% \| -20% \|
	\| Tamil \| 85% \| +3.0% \| -17% \|
	\| Gujarati \| 84% \| +2.2% \| -12% \|
	\| Kannada \| 86.5% \| +4.5% \| -21% \|
	\| Malayalam \| 87.5% \| +3.8% \| -19% \|
	\| Punjabi \| 83% \| +2.0% \| -11% \|
	\| Odia \| 88.5% \| +4.0% \| -20% \|


	### BibTeX entry and citation info
	If you use this model in your research, please cite it as follows:

	```bibtex
	@misc{whisper-medium-hindi-fine-tuned,
	author = {Indika AI},
	title = {Fine-tuned Whisper Medium for Hindi Language},
	year = {2024},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub}
	}
	```
	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 2
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 4
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000
	- mixed_precision_training: Native AMP


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu121
	- Datasets 2.16.0
	- Tokenizers 0.15.0