File size: 3,071 Bytes
46a0042
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c71572f
46a0042
4864fe6
dc2f730
46a0042
b680c8a
4864fe6
46a0042
b680c8a
4864fe6
46a0042
0fc7b0a
4864fe6
46a0042
dc2f730
 
46a0042
4864fe6
 
46a0042
4864fe6
 
 
0fc7b0a
4864fe6
99a938e
 
 
0d9f970
99a938e
 
7a3c8f4
 
 
 
 
 
 
 
 
 
 
99a938e
 
85379b9
4864fe6
 
85379b9
9ab2145
 
dc2f730
4864fe6
 
 
 
85379b9
46a0042
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
language:
- hi
license: apache-2.0
base_model: openai/whisper-medium
tags:
- whisper-event
- generated_from_trainer
datasets:
- mozilla-foundation/common_voice_11_0
metrics:
- wer
model-index:
- name: Whisper Medium finetuned Hindi
  results:
  - task:
      name: Automatic Speech Recognition
      type: automatic-speech-recognition
    dataset:
      name: common_voice_11_0
      type: mozilla-foundation/common_voice_11_0
      config: hi
      split: test
      args: hi
    metrics:
    - name: Wer
      type: wer
      value: 99.8077099166743
---

# iVaani - Fine-tuned ASR model for Hindi Language

# Model Description
This is iVaani model, specifically optimized for the Hindi language. The fine-tuning process has led to an improvement in accuracy by 2.5% compared to the original Whisper model.

# Performance
After fine-tuning, the model shows a 2.5% increase in transcription accuracy for Hindi language audio compared to the base Whisper medium model.

# How to Use
You can use this model directly with a simple API call in Hugging Face. Here is a Python code snippet for using the model:

```python
from transformers import AutoModelForCTC, Wav2Vec2Processor

model = AutoModelForCTC.from_pretrained("rukaiyah-indika-ai/iVaani")
processor = Wav2Vec2Processor.from_pretrained("rukaiyah-indika-ai/iVaani")

# Replace 'path_to_audio_file' with the path to your Hindi audio file
input_audio = processor(path_to_audio_file, return_tensors="pt", padding=True)

# Perform the transcription
transcription = model.generate(**input_audio)
print("Transcription:", transcription)
```

# Additional Language Models
Indika AI has also fine-tuned ASR (Automatic Speech Recognition) models for several other Indic languages,
enhancing the accuracy by 2-5% for each language. The word error rate has also been significantly reduced. 

The additional languages include:

| Language   | Original Accuracy | 
|------------|-------------------|
| Bengali    | 88%               | 
| Telugu     | 86%               | 
| Marathi    | 87%               | 
| Tamil      | 88%               | 
| Gujarati   | 90%               | 
| Kannada    | 86.5%             | 
| Malayalam  | 87.5%             | 
| Punjabi    | 89%               | 
| Odia       | 88.5%             |


### BibTeX entry and citation info
If you use this model in your research, please cite it as follows:

```bibtex
@misc{whisper-medium-hindi-fine-tuned,
  author = {Indika AI},
  title = {iVaani},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub}
}
```
### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 2
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 4
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
- mixed_precision_training: Native AMP


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0