bofenghuang commited on
Commit
2875f30
·
1 Parent(s): 6b23d76
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ # Whisper-Large-V3-Distil-French-v0.2
3
+
4
+ A distilled version of Whisper with 2 decoder layers, optimized for French speech-to-text.
5
+
6
+ Compared to [v0.1](https://huggingface.co/collections/bofenghuang/french-whisper-v01-64f9cc3cf625e46d12f0e4bd), this version extends the training to 30-second audio segments to maintain long-form transcription abilities. The training process used a ["patient" teacher](https://arxiv.org/abs/2106.05237) during distillation - meaning longer training times and more aggressive data augmentation - which improved overall performance.
7
+
8
+ The model uses [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) as the teacher model while keeping the encoder architecture unchanged. This makes it suitable as a draft model for speculative decoding, potentially getting 2x inference speed while maintaining identical outputs by only adding 2 extra decoder layers and running the encoder just once. It can also serve as a standalone model to trade some accuracy for better efficiency, running 5.8x faster while using only 49% of the parameters. This [paper](https://arxiv.org/abs/2311.00430) also suggests that the distilled model may actually produce fewer hallucinations than the full model during long-form transcription.
9
+
10
+ The model has been converted into multiple formats to ensure broad compatibility across libraries including transformers, openai-whisper, fasterwhisper, whisper.cpp, candle, mlx.
11
+
12
+ ## Performance
13
+
14
+ The model was evaluated on both short and long-form transcriptions, using in-distribution (ID) and out-of-distribution (OOD) datasets to assess accuracy, generalizability, and robustness.
15
+
16
+ Note that Word Error Rate (WER) results shown here are [post-normalization](https://github.com/openai/whisper/blob/main/whisper/normalizers/basic.py), which includes converting text to lowercase and removing symbols and punctuation.
17
+
18
+ All evaluation results on the public datasets can be found [here]().
19
+
20
+ ### Short-Form Transcription
21
+
22
+ | Model | mcv17 | mls | voxpopuli | mtedx | af_accented | fleurs | zaion1 | zaion2 | zaion3 | zaion4 |
23
+ |-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
24
+ | openai/whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
25
+ | openai/whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
26
+ | bofenghuang/whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
27
+ | bofenghuang/whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
28
+ | bofenghuang/whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
29
+ | eustlb/distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
30
+ | bofenghuang/whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
31
+
32
+ *Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
33
+
34
+ Due to the limited availability of out-of-distribution (OOD) and long-form French test sets, evaluation was also performed using internal test sets from [Zaion Lab](https://zaion.ai/) - consisting of human-annotated call center conversations with significant background noise and domain-specific terminology.
35
+
36
+ ### Long-Form Transcription
37
+
38
+ Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
39
+
40
+ | Model | community-v2/dev_data | | mtedx | | zaion5 | | zaion6 | |
41
+ |-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
42
+ | | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
43
+ | openai/whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
44
+ | openai/whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
45
+ | bofenghuang/whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
46
+ | bofenghuang/whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
47
+ | bofenghuang/whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
48
+ | eustlb/distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
49
+ | bofenghuang/whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |
50
+
hf-whisper-v4.3/{events.out.tfevents.1723289290.jzxh016.1378940.0 → events.out.tfevents.1729591939.jzxh176.969603.0} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:3b8ac41e73fd5f4b676d05d5a9cd879e23441693097f0e6f6bfb9a8af8ab8afd
3
- size 2238312
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b48b546d1a05ebb56f696f0b6574bbf10e9ad2fbd9b786f699a375724c7eded5
3
+ size 2586482
hf-whisper-v4.3/events.out.tfevents.1729663979.jzxh093.1022978.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7d6fd3d984bc5cb3b9f298f744799bda523f46b4c078b2e0d5c2774ef56f5de
3
+ size 1174878
hf-whisper-v4.3/{events.out.tfevents.1723361596.jzxh020.602378.0 → events.out.tfevents.1729698618.jzxh269.1472456.0} RENAMED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:021fe2064c5d5c2d4e1d92baad33d9ea6fc5dc58176b57bb0c79d5a8dc984377
3
- size 448330
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3deb473c36208139c97ff9ee96340d852565ac66b41a642461092a38968affd9
3
+ size 562545
hf-whisper-v4.3/events.out.tfevents.1729714797.jzxh046.301692.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dd0ffa1d4e0908f910f00bc8dfaa0ae3aae87264bab3465c481e05876358894b
3
+ size 2662860
hf-whisper-v4.3/events.out.tfevents.1729804962.jzxh195.623633.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fe2804c0e52dbcc56e61ac6ec5bbe8bf33d6658b5f544fb0e974bd9d29582ee8
3
+ size 2561457
hf-whisper-v4.3/events.out.tfevents.1729878444.jzxh027.3387440.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffcfb24947ab798e2916fa3c361706c81179ff06bb0b7bd28874ffc8ad7011a0
3
+ size 1752562
hf-whisper-v4.3/events.out.tfevents.1729928422.jzxh043.3632435.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:541095747cf7a0217527000dfd90b7f07b724ba24a0be25bda17d8adbb8d045d
3
+ size 554033
hf-whisper-v4.3/events.out.tfevents.1729944706.jzxh043.3643886.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:372aaece7676dad4dbc442092afd9d3e931f803fb0e1524cb53efb6185543077
3
+ size 1602030
hf-whisper-v4.3/events.out.tfevents.1729991579.jzxh069.732416.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e814737b72e3315646d1e0f78bfb69e7fd5bd9b58ea0af000cc735d44abb204a
3
+ size 1296514
hf-whisper-v4.3/events.out.tfevents.1730029175.jzxh019.1273419.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b76dbe87aa267705982211639819cfb356b0e56087a34c9044e054e8c66feacb
3
+ size 2250857
hf-whisper-v4.3/events.out.tfevents.1730094586.jzxh019.1313067.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ed1df5e5d7004f831d62b04c49e5df1ff7804f1a88003492a3151336cb47d5bb
3
+ size 88
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85fc58d475b8cb5afcb535aa5278a3962d6eb967fb33eca992ae1bee5ff487d9
3
  size 3025686376
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9ed9f27f071a5750d84acbab580ede93f872fceb33a31661c88c5121fbdd6051
3
  size 3025686376