--- language: - hi license: apache-2.0 base_model: openai/whisper-medium tags: - whisper-event - generated_from_trainer datasets: - mozilla-foundation/common_voice_11_0 metrics: - wer model-index: - name: Whisper Medium finetuned Hindi results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: common_voice_11_0 type: mozilla-foundation/common_voice_11_0 config: hi split: test args: hi metrics: - name: Wer type: wer value: 99.8077099166743 --- # Fine-tuned Whisper Medium for Hindi Language # Model Description This model is a fine-tuned version of OpenAI's Whisper medium model, specifically optimized for the Hindi language. The fine-tuning process has led to an improvement in accuracy by 2.5% compared to the original Whisper model. # Performance After fine-tuning, the model shows a 2.5% increase in transcription accuracy for Hindi language audio compared to the base Whisper medium model. # How to Use You can use this model directly with a simple API call in Hugging Face. Here is a Python code snippet for using the model: ```python from transformers import AutoModelForCTC, Wav2Vec2Processor model = AutoModelForCTC.from_pretrained("rukaiyah-indika-ai/whisper-medium-hindi-fine-tuned") processor = Wav2Vec2Processor.from_pretrained("rukaiyah-indika-ai/whisper-medium-hindi-fine-tuned") # Replace 'path_to_audio_file' with the path to your Hindi audio file input_audio = processor(path_to_audio_file, return_tensors="pt", padding=True) # Perform the transcription transcription = model.generate(**input_audio) print("Transcription:", transcription) ``` # Additional Language Models Indika AI has also fine-tuned ASR (Automatic Speech Recognition) models for several other Indic languages, enhancing the accuracy by 2-5% for each language. The word error rate has also been significantly reduced. The additional languages include: | Language | Original Accuracy | Accuracy Improvement | Word Error Rate Reduction| |------------|-------------------|----------------------|--------------------------| | Bengali | 88% | +3.5% | -18% | | Telugu | 86% | +2.8% | -15% | | Marathi | 87% | +4.2% | -20% | | Tamil | 85% | +3.0% | -17% | | Gujarati | 84% | +2.2% | -12% | | Kannada | 86.5% | +4.5% | -21% | | Malayalam | 87.5% | +3.8% | -19% | | Punjabi | 83% | +2.0% | -11% | | Odia | 88.5% | +4.0% | -20% | ### BibTeX entry and citation info If you use this model in your research, please cite it as follows: ```bibtex @misc{whisper-medium-hindi-fine-tuned, author = {Indika AI}, title = {Fine-tuned Whisper Medium for Hindi Language}, year = {2024}, publisher = {Hugging Face}, journal = {Hugging Face Model Hub} } ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 2 - eval_batch_size: 4 - seed: 42 - gradient_accumulation_steps: 2 - total_train_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 100 - training_steps: 1000 - mixed_precision_training: Native AMP ### Framework versions - Transformers 4.35.2 - Pytorch 2.1.0+cu121 - Datasets 2.16.0 - Tokenizers 0.15.0