rgomez-itg commited on
Commit
55153c5
1 Parent(s): 4a83b85

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +78 -0
README.md CHANGED
@@ -1,3 +1,81 @@
1
  ---
2
  license: cc-by-nc-nd-4.0
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-nc-nd-4.0
3
+ datasets:
4
+ - openslr
5
+ language:
6
+ - gl
7
+ pipeline_tag: automatic-speech-recognition
8
+ tags:
9
+ - ITG
10
+ - PyTorch
11
+ - Transformers
12
+ - whisper
13
+ - whisper-base
14
  ---
15
+
16
+ # whisper-base-gl
17
+
18
+ ## Description
19
+
20
+ This is a fine-tuned version of the [openai/whisper-base](https://huggingface.co/openai/whisper-base) pre-trained model for ASR in galician.
21
+
22
+ ---
23
+
24
+ ## Dataset
25
+
26
+ We used one of the datasets available in the openslr repository, the [OpenSLR galician](https://huggingface.co/datasets/openslr/viewer/SLR77).
27
+
28
+ ---
29
+
30
+
31
+ ## Example inference script
32
+
33
+ ### Check this example script to run our model in inference mode
34
+
35
+ ```python
36
+ import torch
37
+ from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
38
+
39
+ filename = "demo.wav" #change this line to the name of your audio file
40
+ sample_rate = 16_000
41
+ processor = AutoProcessor.from_pretrained('ITG/whisper-base-gl')
42
+ model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-base-gl')
43
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
44
+ model.to(device)
45
+
46
+ with torch.no_grad():
47
+ speech_array, _ = librosa.load(filename, sr=sample_rate)
48
+ inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device)
49
+ input_features = inputs.input_features
50
+ generated_ids = model.generate(inputs=input_features, max_length=225)
51
+ decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
52
+ print(f"ASR Galician whisper-base output: {decode_output}")
53
+ ```
54
+ ---
55
+
56
+ ## Fine-tuning hyper-parameters
57
+
58
+ | **Hyper-parameter** | **Value** |
59
+ |:----------------------------------------:|:---------------------------:|
60
+ | Training batch size | 16 |
61
+ | Evaluation batch size | 8 |
62
+ | Learning rate | 3e-5 |
63
+ | Gradient checkpointing | true |
64
+ | Gradient accumulation steps | 1 |
65
+ | Max training epochs | 100 |
66
+ | Max steps | 4000 |
67
+ | Generate max length | 225 |
68
+ | Warmup training steps (%) | 12,5% |
69
+ | FP16 | true |
70
+ | Metric for best model | wer |
71
+ | Greater is better | false |
72
+
73
+
74
+ ## Fine-tuning in a different dataset or style
75
+
76
+ If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-base model](https://huggingface.co/openai/whisper-base). Additionally, you may find the Transformers
77
+ step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co/blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training
78
+ process of this Galician whisper-base model!
79
+
80
+
81
+