File size: 3,071 Bytes
46a0042 c71572f 46a0042 4864fe6 dc2f730 46a0042 b680c8a 4864fe6 46a0042 b680c8a 4864fe6 46a0042 0fc7b0a 4864fe6 46a0042 dc2f730 46a0042 4864fe6 46a0042 4864fe6 0fc7b0a 4864fe6 99a938e 0d9f970 99a938e 7a3c8f4 99a938e 85379b9 4864fe6 85379b9 9ab2145 dc2f730 4864fe6 85379b9 46a0042 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
---
language:
- hi
license: apache-2.0
base_model: openai/whisper-medium
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Medium finetuned Hindi
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: common_voice_11_0
type: mozilla-foundation/common_voice_11_0
config: hi
split: test
args: hi
metrics:
- name: Wer
type: wer
value: 99.8077099166743
---
# iVaani - Fine-tuned ASR model for Hindi Language
# Model Description
This is iVaani model, specifically optimized for the Hindi language. The fine-tuning process has led to an improvement in accuracy by 2.5% compared to the original Whisper model.
# Performance
After fine-tuning, the model shows a 2.5% increase in transcription accuracy for Hindi language audio compared to the base Whisper medium model.
# How to Use
You can use this model directly with a simple API call in Hugging Face. Here is a Python code snippet for using the model:
```python
from transformers import AutoModelForCTC, Wav2Vec2Processor
model = AutoModelForCTC.from_pretrained("rukaiyah-indika-ai/iVaani")
processor = Wav2Vec2Processor.from_pretrained("rukaiyah-indika-ai/iVaani")
# Replace 'path_to_audio_file' with the path to your Hindi audio file
input_audio = processor(path_to_audio_file, return_tensors="pt", padding=True)
# Perform the transcription
transcription = model.generate(**input_audio)
print("Transcription:", transcription)
```
# Additional Language Models
Indika AI has also fine-tuned ASR (Automatic Speech Recognition) models for several other Indic languages,
enhancing the accuracy by 2-5% for each language. The word error rate has also been significantly reduced.
The additional languages include:
| Language | Original Accuracy |
|------------|-------------------|
| Bengali | 88% |
| Telugu | 86% |
| Marathi | 87% |
| Tamil | 88% |
| Gujarati | 90% |
| Kannada | 86.5% |
| Malayalam | 87.5% |
| Punjabi | 89% |
| Odia | 88.5% |
### BibTeX entry and citation info
If you use this model in your research, please cite it as follows:
```bibtex
@misc{whisper-medium-hindi-fine-tuned,
author = {Indika AI},
title = {iVaani},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub}
}
```
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
- mixed_precision_training: Native AMP
### Framework versions
- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0
|