Update README.md
Browse files
README.md
CHANGED
@@ -9,4 +9,18 @@ pipeline_tag: text-to-image
|
|
9 |
tags:
|
10 |
- medical
|
11 |
- free tags
|
12 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
9 |
tags:
|
10 |
- medical
|
11 |
- free tags
|
12 |
+
---
|
13 |
+
|
14 |
+
# Whisper
|
15 |
+
|
16 |
+
Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours
|
17 |
+
of labelled data, Whisper models demonstrate a strong ability to generalise to many datasets and domains **without** the need
|
18 |
+
for fine-tuning.
|
19 |
+
|
20 |
+
Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356)
|
21 |
+
by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper).
|
22 |
+
|
23 |
+
Whisper `large-v3` has the same architecture as the previous large models except the following minor differences:
|
24 |
+
|
25 |
+
1. The input uses 128 Mel frequency bins instead of 80
|
26 |
+
2. A new language token for Cantonese
|