bofenghuang
commited on
Commit
·
2875f30
1
Parent(s):
6b23d76
Add v0.2
Browse files- README.md +50 -0
- hf-whisper-v4.3/{events.out.tfevents.1723289290.jzxh016.1378940.0 → events.out.tfevents.1729591939.jzxh176.969603.0} +2 -2
- hf-whisper-v4.3/events.out.tfevents.1729663979.jzxh093.1022978.0 +3 -0
- hf-whisper-v4.3/{events.out.tfevents.1723361596.jzxh020.602378.0 → events.out.tfevents.1729698618.jzxh269.1472456.0} +2 -2
- hf-whisper-v4.3/events.out.tfevents.1729714797.jzxh046.301692.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1729804962.jzxh195.623633.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1729878444.jzxh027.3387440.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1729928422.jzxh043.3632435.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1729944706.jzxh043.3643886.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1729991579.jzxh069.732416.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1730029175.jzxh019.1273419.0 +3 -0
- hf-whisper-v4.3/events.out.tfevents.1730094586.jzxh019.1313067.0 +3 -0
- model.safetensors +1 -1
README.md
ADDED
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
# Whisper-Large-V3-Distil-French-v0.2
|
3 |
+
|
4 |
+
A distilled version of Whisper with 2 decoder layers, optimized for French speech-to-text.
|
5 |
+
|
6 |
+
Compared to [v0.1](https://huggingface.co/collections/bofenghuang/french-whisper-v01-64f9cc3cf625e46d12f0e4bd), this version extends the training to 30-second audio segments to maintain long-form transcription abilities. The training process used a ["patient" teacher](https://arxiv.org/abs/2106.05237) during distillation - meaning longer training times and more aggressive data augmentation - which improved overall performance.
|
7 |
+
|
8 |
+
The model uses [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) as the teacher model while keeping the encoder architecture unchanged. This makes it suitable as a draft model for speculative decoding, potentially getting 2x inference speed while maintaining identical outputs by only adding 2 extra decoder layers and running the encoder just once. It can also serve as a standalone model to trade some accuracy for better efficiency, running 5.8x faster while using only 49% of the parameters. This [paper](https://arxiv.org/abs/2311.00430) also suggests that the distilled model may actually produce fewer hallucinations than the full model during long-form transcription.
|
9 |
+
|
10 |
+
The model has been converted into multiple formats to ensure broad compatibility across libraries including transformers, openai-whisper, fasterwhisper, whisper.cpp, candle, mlx.
|
11 |
+
|
12 |
+
## Performance
|
13 |
+
|
14 |
+
The model was evaluated on both short and long-form transcriptions, using in-distribution (ID) and out-of-distribution (OOD) datasets to assess accuracy, generalizability, and robustness.
|
15 |
+
|
16 |
+
Note that Word Error Rate (WER) results shown here are [post-normalization](https://github.com/openai/whisper/blob/main/whisper/normalizers/basic.py), which includes converting text to lowercase and removing symbols and punctuation.
|
17 |
+
|
18 |
+
All evaluation results on the public datasets can be found [here]().
|
19 |
+
|
20 |
+
### Short-Form Transcription
|
21 |
+
|
22 |
+
| Model | mcv17 | mls | voxpopuli | mtedx | af_accented | fleurs | zaion1 | zaion2 | zaion3 | zaion4 |
|
23 |
+
|-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
|
24 |
+
| openai/whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
|
25 |
+
| openai/whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
|
26 |
+
| bofenghuang/whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
|
27 |
+
| bofenghuang/whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
|
28 |
+
| bofenghuang/whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
|
29 |
+
| eustlb/distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
|
30 |
+
| bofenghuang/whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
|
31 |
+
|
32 |
+
*Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
|
33 |
+
|
34 |
+
Due to the limited availability of out-of-distribution (OOD) and long-form French test sets, evaluation was also performed using internal test sets from [Zaion Lab](https://zaion.ai/) - consisting of human-annotated call center conversations with significant background noise and domain-specific terminology.
|
35 |
+
|
36 |
+
### Long-Form Transcription
|
37 |
+
|
38 |
+
Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
|
39 |
+
|
40 |
+
| Model | community-v2/dev_data | | mtedx | | zaion5 | | zaion6 | |
|
41 |
+
|-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
|
42 |
+
| | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
|
43 |
+
| openai/whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
|
44 |
+
| openai/whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
|
45 |
+
| bofenghuang/whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
|
46 |
+
| bofenghuang/whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
|
47 |
+
| bofenghuang/whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
|
48 |
+
| eustlb/distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
|
49 |
+
| bofenghuang/whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |
|
50 |
+
|
hf-whisper-v4.3/{events.out.tfevents.1723289290.jzxh016.1378940.0 → events.out.tfevents.1729591939.jzxh176.969603.0}
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b48b546d1a05ebb56f696f0b6574bbf10e9ad2fbd9b786f699a375724c7eded5
|
3 |
+
size 2586482
|
hf-whisper-v4.3/events.out.tfevents.1729663979.jzxh093.1022978.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c7d6fd3d984bc5cb3b9f298f744799bda523f46b4c078b2e0d5c2774ef56f5de
|
3 |
+
size 1174878
|
hf-whisper-v4.3/{events.out.tfevents.1723361596.jzxh020.602378.0 → events.out.tfevents.1729698618.jzxh269.1472456.0}
RENAMED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:3deb473c36208139c97ff9ee96340d852565ac66b41a642461092a38968affd9
|
3 |
+
size 562545
|
hf-whisper-v4.3/events.out.tfevents.1729714797.jzxh046.301692.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:dd0ffa1d4e0908f910f00bc8dfaa0ae3aae87264bab3465c481e05876358894b
|
3 |
+
size 2662860
|
hf-whisper-v4.3/events.out.tfevents.1729804962.jzxh195.623633.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:fe2804c0e52dbcc56e61ac6ec5bbe8bf33d6658b5f544fb0e974bd9d29582ee8
|
3 |
+
size 2561457
|
hf-whisper-v4.3/events.out.tfevents.1729878444.jzxh027.3387440.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ffcfb24947ab798e2916fa3c361706c81179ff06bb0b7bd28874ffc8ad7011a0
|
3 |
+
size 1752562
|
hf-whisper-v4.3/events.out.tfevents.1729928422.jzxh043.3632435.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:541095747cf7a0217527000dfd90b7f07b724ba24a0be25bda17d8adbb8d045d
|
3 |
+
size 554033
|
hf-whisper-v4.3/events.out.tfevents.1729944706.jzxh043.3643886.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:372aaece7676dad4dbc442092afd9d3e931f803fb0e1524cb53efb6185543077
|
3 |
+
size 1602030
|
hf-whisper-v4.3/events.out.tfevents.1729991579.jzxh069.732416.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e814737b72e3315646d1e0f78bfb69e7fd5bd9b58ea0af000cc735d44abb204a
|
3 |
+
size 1296514
|
hf-whisper-v4.3/events.out.tfevents.1730029175.jzxh019.1273419.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b76dbe87aa267705982211639819cfb356b0e56087a34c9044e054e8c66feacb
|
3 |
+
size 2250857
|
hf-whisper-v4.3/events.out.tfevents.1730094586.jzxh019.1313067.0
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:ed1df5e5d7004f831d62b04c49e5df1ff7804f1a88003492a3151336cb47d5bb
|
3 |
+
size 88
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 3025686376
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:9ed9f27f071a5750d84acbab580ede93f872fceb33a31661c88c5121fbdd6051
|
3 |
size 3025686376
|