Add v0.2

Browse files

Files changed (13) hide show

README.md +50 -0
hf-whisper-v4.3/{events.out.tfevents.1723289290.jzxh016.1378940.0 → events.out.tfevents.1729591939.jzxh176.969603.0} +2 -2
hf-whisper-v4.3/events.out.tfevents.1729663979.jzxh093.1022978.0 +3 -0
hf-whisper-v4.3/{events.out.tfevents.1723361596.jzxh020.602378.0 → events.out.tfevents.1729698618.jzxh269.1472456.0} +2 -2
hf-whisper-v4.3/events.out.tfevents.1729714797.jzxh046.301692.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1729804962.jzxh195.623633.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1729878444.jzxh027.3387440.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1729928422.jzxh043.3632435.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1729944706.jzxh043.3643886.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1729991579.jzxh069.732416.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1730029175.jzxh019.1273419.0 +3 -0
hf-whisper-v4.3/events.out.tfevents.1730094586.jzxh019.1313067.0 +3 -0
model.safetensors +1 -1

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+# Whisper-Large-V3-Distil-French-v0.2
+A distilled version of Whisper with 2 decoder layers, optimized for French speech-to-text.
+Compared to [v0.1](https://huggingface.co/collections/bofenghuang/french-whisper-v01-64f9cc3cf625e46d12f0e4bd), this version extends the training to 30-second audio segments to maintain long-form transcription abilities. The training process used a ["patient" teacher](https://arxiv.org/abs/2106.05237) during distillation - meaning longer training times and more aggressive data augmentation - which improved overall performance.
+The model uses [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3) as the teacher model while keeping the encoder architecture unchanged. This makes it suitable as a draft model for speculative decoding, potentially getting 2x inference speed while maintaining identical outputs by only adding 2 extra decoder layers and running the encoder just once. It can also serve as a standalone model to trade some accuracy for better efficiency, running 5.8x faster while using only 49% of the parameters. This [paper](https://arxiv.org/abs/2311.00430) also suggests that the distilled model may actually produce fewer hallucinations than the full model during long-form transcription.
+The model has been converted into multiple formats to ensure broad compatibility across libraries including transformers, openai-whisper, fasterwhisper, whisper.cpp, candle, mlx.
+## Performance
+The model was evaluated on both short and long-form transcriptions, using in-distribution (ID) and out-of-distribution (OOD) datasets to assess accuracy, generalizability, and robustness.
+Note that Word Error Rate (WER) results shown here are [post-normalization](https://github.com/openai/whisper/blob/main/whisper/normalizers/basic.py), which includes converting text to lowercase and removing symbols and punctuation.
+All evaluation results on the public datasets can be found [here]().
+### Short-Form Transcription
+| Model | mcv17 | mls | voxpopuli | mtedx | af_accented | fleurs | zaion1 | zaion2 | zaion3 | zaion4 |
+|-------|--------|-----|------------|--------|--------------|---------|---------|---------|---------|---------|
+| openai/whisper-large-v3 | 10.98 | 4.68 | 11.15 | 8.65 | 7.55 | 5.38 | 24.00 | 27.52 | 32.95 | 24.14 |
+| openai/whisper_large_v3_turbo | 12.25 | 5.08 | 12.21 | 9.87 | 8.37 | 5.50 | 26.49 | 28.33 | 34.80 | 24.94 |
+| bofenghuang/whisper-large-v3-french | ~~*8.95*~~ | *4.68* | *9.82* | *8.33* | *5.25* | *5.14* | 22.53 | 27.51 | 29.14 | 22.44 |
+| bofenghuang/whisper-large-v3-french-distil-dec16 | ~~*8.86*~~ | *4.28* | *9.66* | *8.14* | *4.93* | *5.37* | 21.70 | 25.20 | 28.83 | 20.46 |
+| bofenghuang/whisper-large-v3-french-distil-dec2 | ~~*10.52*~~ | *5.34* | *10.59* | *9.37* | *5.68* | *7.30* | 24.91 | 29.57 | 32.34 | 24.46 |
+| eustlb/distil-large-v3-fr | *12.64* | *5.84* | 11.84 | 9.65 | 8.83 | 7.81 | 24.34 | 28.77 | 34.05 | 24.10 |
+| bofenghuang/whisper-large-v3-distil-fr-v0.2 | *11.10* | *5.00* | *10.68* | *8.75* | *7.09* | 6.35 | 23.01 | 26.91 | 31.46 | 22.33 |
+*Italic* indicates in-distribution (ID) evaluation, where test sets correspond to data distributions seen during training, typically yielding higher performance than out-of-distribution (OOD) evaluation. *~~Italic and strikethrough~~* denotes potential test set contamination - for example, when training and evaluation use different versions of Common Voice, raising the possibility of overlapping data.
+Due to the limited availability of out-of-distribution (OOD) and long-form French test sets, evaluation was also performed using internal test sets from [Zaion Lab](https://zaion.ai/) - consisting of human-annotated call center conversations with significant background noise and domain-specific terminology.
+### Long-Form Transcription
+Long-form transcription evaluation used the 🤗 Hugging Face [`pipeline`](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline) with both [chunked](https://huggingface.co/blog/asr-chunking) (chunk_length_s=30) and original sequential decoding methods.
+| Model | community-v2/dev_data |  | mtedx |  | zaion5 |  | zaion6 |  |
+|-------|-----------|-----------|---------|-----------|---------|-----------|---------|-----------|
+|  | chunked | sequential | chunked | sequential | chunked | sequential | chunked | sequential |
+| openai/whisper-large-v3 | 9.89 | 8.97 | 9.00 | 8.01 | 40.76 | 30.49 | 32.08 | 25.56 |
+| openai/whisper_large_v3_turbo | 10.11 | 9.00 | 8.49 | 8.45 | 34.59 | 29.35 | 30.00 | 24.84 |
+| bofenghuang/whisper-large-v3-french | 9.33 | 9.99 | *9.85* | *9.49* | 35.92 | 29.01 | 29.03 | 23.55 |
+| bofenghuang/whisper-large-v3-french-distil-dec16 | 8.97 | 10.11 | *9.61* | *11.72* | 27.14 | 27.57 | 25.25 | 23.66 |
+| bofenghuang/whisper-large-v3-french-distil-dec2 | 16.59 | 18.98 | *12.79* | *14.92* | 36.25 | 36.42 | 34.37 | 33.74 |
+| eustlb/distil-large-v3-fr | 11.31 | 11.34 | 10.36 | 10.52 | 31.38 | 30.32 | 28.05 | 26.43 |
+| bofenghuang/whisper-large-v3-distil-fr-v0.2 | 9.44 | 9.84 | *8.94* | *9.03* | 29.40 | 28.54 | 26.17 | 23.75 |

hf-whisper-v4.3/{events.out.tfevents.1723289290.jzxh016.1378940.0 → events.out.tfevents.1729591939.jzxh176.969603.0} RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3b8ac41e73fd5f4b676d05d5a9cd879e23441693097f0e6f6bfb9a8af8ab8afd
-size 2238312

 version https://git-lfs.github.com/spec/v1
+oid sha256:b48b546d1a05ebb56f696f0b6574bbf10e9ad2fbd9b786f699a375724c7eded5
+size 2586482

hf-whisper-v4.3/events.out.tfevents.1729663979.jzxh093.1022978.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c7d6fd3d984bc5cb3b9f298f744799bda523f46b4c078b2e0d5c2774ef56f5de
+size 1174878

hf-whisper-v4.3/{events.out.tfevents.1723361596.jzxh020.602378.0 → events.out.tfevents.1729698618.jzxh269.1472456.0} RENAMED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:021fe2064c5d5c2d4e1d92baad33d9ea6fc5dc58176b57bb0c79d5a8dc984377
-size 448330

 version https://git-lfs.github.com/spec/v1
+oid sha256:3deb473c36208139c97ff9ee96340d852565ac66b41a642461092a38968affd9
+size 562545

hf-whisper-v4.3/events.out.tfevents.1729714797.jzxh046.301692.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd0ffa1d4e0908f910f00bc8dfaa0ae3aae87264bab3465c481e05876358894b
+size 2662860

hf-whisper-v4.3/events.out.tfevents.1729804962.jzxh195.623633.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fe2804c0e52dbcc56e61ac6ec5bbe8bf33d6658b5f544fb0e974bd9d29582ee8
+size 2561457

hf-whisper-v4.3/events.out.tfevents.1729878444.jzxh027.3387440.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ffcfb24947ab798e2916fa3c361706c81179ff06bb0b7bd28874ffc8ad7011a0
+size 1752562

hf-whisper-v4.3/events.out.tfevents.1729928422.jzxh043.3632435.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:541095747cf7a0217527000dfd90b7f07b724ba24a0be25bda17d8adbb8d045d
+size 554033

hf-whisper-v4.3/events.out.tfevents.1729944706.jzxh043.3643886.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:372aaece7676dad4dbc442092afd9d3e931f803fb0e1524cb53efb6185543077
+size 1602030

hf-whisper-v4.3/events.out.tfevents.1729991579.jzxh069.732416.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e814737b72e3315646d1e0f78bfb69e7fd5bd9b58ea0af000cc735d44abb204a
+size 1296514

hf-whisper-v4.3/events.out.tfevents.1730029175.jzxh019.1273419.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b76dbe87aa267705982211639819cfb356b0e56087a34c9044e054e8c66feacb
+size 2250857

hf-whisper-v4.3/events.out.tfevents.1730094586.jzxh019.1313067.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ed1df5e5d7004f831d62b04c49e5df1ff7804f1a88003492a3151336cb47d5bb
+size 88

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:85fc58d475b8cb5afcb535aa5278a3962d6eb967fb33eca992ae1bee5ff487d9
 size 3025686376

 version https://git-lfs.github.com/spec/v1
+oid sha256:9ed9f27f071a5750d84acbab580ede93f872fceb33a31661c88c5121fbdd6051
 size 3025686376