File size: 2,406 Bytes
9d31bcd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aeb215c
 
 
 
 
 
 
9d31bcd
 
 
aeb215c
 
 
 
 
 
 
 
 
9d31bcd
 
 
 
aeb215c
9d31bcd
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
license: mit
language:
- pt
base_model:
- distil-whisper/distil-large-v3
pipeline_tag: automatic-speech-recognition
tags:
- asr
- pt
- ptbr
- stt
- speech-to-text
- automatic-speech-recognition
---
# Distil-Whisper-Large-v3 for Brazilian Portuguese

<!-- Provide a quick summary of what the model is/does. -->

This model is a fine-tuned version of distil-whisper-large-v3 for automatic speech recognition (ASR) in Brazilian Portuguese. It was trained using the Common Voice 16 dataset in conjunction with a private dataset transcribed using Whisper Large v3.

### Model Description

<!-- Provide a longer summary of what this model is. -->

The model aims to perform automatic speech transcription in Brazilian Portuguese with high accuracy. By combining data from Common Voice 16 with an automatically transcribed private dataset, the model achieved a Word Error Rate (WER) of 8.93% on the validation set of Common Voice 16.

- **Model type:** Speech recognition model based on distil-whisper-large-v3
- **Language(s) (NLP):** Brazilian Portuguese (pt-BR)
- **License:** MIT
- **Finetuned from model [optional]:** distil-whisper/distil-large-v3

## How to Get Started with the Model

You can use the model with the Transformers library:
from transformers import WhisperForConditionalGeneration, WhisperProcessor

```python   
from datasets import load_dataset
from transformers import WhisperProcessor, WhisperForConditionalGeneration

# Load the validation split of the Common Voice dataset for Portuguese
common_voice = load_dataset("mozilla-foundation/common_voice_11_0", "pt", split="validation")

# Load the pretrained model and processor
processor = WhisperProcessor.from_pretrained("freds0/distil-whisper-large-v3-ptbr")
model = WhisperForConditionalGeneration.from_pretrained("freds0/distil-whisper-large-v3-ptbr")

# Select a sample from the dataset
sample = common_voice[0]  # You can change the index to select a different sample

# Get the audio array and sampling rate
audio_input = sample["audio"]["array"]
sampling_rate = sample["audio"]["sampling_rate"]

# Preprocess the audio
input_features = processor(audio_input, sampling_rate=sampling_rate, return_tensors="pt").input_features

# Generate transcription
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print("Transcription:", transcription[0])
```