SujithPulikodan commited on
Commit
402de0b
·
verified ·
1 Parent(s): 59751c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -38
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- license: mit
3
  datasets:
4
  - ARTPARK-IISc/Vaani
5
  language:
@@ -8,53 +8,45 @@ base_model:
8
  - openai/whisper-tiny
9
  pipeline_tag: automatic-speech-recognition
10
  ---
11
- ```python
12
-
13
-
14
- import torch
15
- from transformers import WhisperForConditionalGeneration, WhisperProcessor, WhisperTokenizer,WhisperFeatureExtractor
16
- import soundfile as sf
17
-
18
 
19
- model="ARTPARK-IISc/whisper-tiny-vaani-hindi"
20
 
21
- # Load tokenizer and feature extractor individually
22
- feature_extractor = WhisperFeatureExtractor.from_pretrained(model)
23
- tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-tiny", language="Hindi", task="transcribe")
24
 
 
25
 
26
- # Create the processor manually
27
- processor = WhisperProcessor(feature_extractor=feature_extractor, tokenizer=tokenizer)
28
-
29
- # Load and preprocess the audio file
30
- audio_file_path = "Sample_Audio.wav" # replace with your audio file path
31
-
32
-
33
- device = "cuda" if torch.cuda.is_available() else "cpu"
34
 
35
- # Load the processor and model
36
- model = WhisperForConditionalGeneration.from_pretrained(model).to(device)
37
 
 
 
 
 
 
38
 
39
- # load audio
40
- audio_data, sample_rate = sf.read(audio_file_path)
41
- # Ensure the audio is 16kHz (Whisper expects 16kHz audio)
42
- if sample_rate != 16000:
43
- import torchaudio
44
- resampler = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=16000)
45
- audio_data = resampler(torch.tensor(audio_data).unsqueeze(0)).squeeze().numpy()
46
 
 
 
47
 
48
- # Use the processor to prepare the input features
49
- input_features = processor(audio_data, sampling_rate=16000, return_tensors="pt").input_features.to(device)
 
50
 
51
- # Generate transcription (disable gradient calculation during inference)
52
- with torch.no_grad():
53
- predicted_ids = model.generate(input_features)
54
 
55
- # Decode the generated IDs into human-readable text
56
- transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
57
 
58
- print(transcription)
59
 
60
- ```
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  datasets:
4
  - ARTPARK-IISc/Vaani
5
  language:
 
8
  - openai/whisper-tiny
9
  pipeline_tag: automatic-speech-recognition
10
  ---
 
 
 
 
 
 
 
11
 
 
12
 
13
+ # Whisper-small-vaani-hindi
 
 
14
 
15
+ This is a fine-tuned version of [OpenAI's Whisper-tiny](https://huggingface.co/openai/whisper-tiny), trained on approximately 718 hours of transcribed Hindi speech from multiple datasets.
16
 
17
+ # Usage
18
+ This can be used with the pipeline function from the Transformers module.
19
+ ```python
 
 
 
 
 
20
 
21
+ import torch
22
+ from transformers import pipeline
23
 
24
+ audio = "path to the audio file to be transcribed"
25
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
26
+ modelTags="ARTPARK-IISc/whisper-tiny-vaani-hindi"
27
+ transcribe = pipeline(task="automatic-speech-recognition", model=modelTags, chunk_length_s=30, device=device)
28
+ transcribe.model.config.forced_decoder_ids = transcribe.tokenizer.get_decoder_prompt_ids(language="hi", task="transcribe")
29
 
30
+ print('Transcription: ', transcribe(audio)["text"])
 
 
 
 
 
 
31
 
32
+ ```
33
+ # Training and Evaluation
34
 
35
+ The models has finetuned using folllowing dataset [Vaani](https://huggingface.co/datasets/ARTPARK-IISc/Vaani) ,[Gramvaani](https://sites.google.com/view/gramvaaniasrchallenge/dataset)
36
+ [IndicVoices](https://huggingface.co/datasets/ai4bharat/IndicVoices), [Fleurs](https://huggingface.co/datasets/google/fleurs),[IndicTTS](https://huggingface.co/datasets/SPRINGLab/IndicTTS-Hindi)
37
+ and [Commonvoice](https://huggingface.co/datasets/mozilla-foundation/common_voice_17_0)
38
 
 
 
 
39
 
40
+ The performance of the model was evaluated using multiple datasets, and the evaluation results are provided below.
 
41
 
 
42
 
43
+ | Dataset | WER |
44
+ | :---: | :---: |
45
+ | Gramvaani | 42.34 |
46
+ | Fleurs | 26.39 |
47
+ | IndicTTS | 11.77 |
48
+ | MUCS | 39.00 |
49
+ |Commonvoice | 37.95 |
50
+ | Kathbath | 23.91 |
51
+ | Kathbath Noisy| 29.92 |
52
+ | Vaani | 33.33 |