--- license: apache-2.0 language: - es tags: - whisper-event - generated_from_trainer datasets: - mozilla-foundation/common_voice_11_0 - google/fleurs - facebook/multilingual_librispeech - facebook/voxpopuli metrics: - wer model-index: - name: openai/whisper-medium results: - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: mozilla-foundation/common_voice_11_0 es type: mozilla-foundation/common_voice_11_0 config: es split: test args: es metrics: - name: Wer type: wer value: 6.346473676004366 - name: Cer type: cer value: 2.1391 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: FLEURS ASR type: google/fleurs config: es_419 split: test args: es metrics: - name: WER type: wer value: 4.0266 - name: Cer type: cer value: 1.6631 - task: name: Automatic Speech Recognition type: automatic-speech-recognition dataset: name: Multilingual LibriSpeech type: facebook/multilingual_librispeech config: spanish split: test args: language: es metrics: - name: WER type: wer value: 4.6644 - name: Cer type: cer value: 1.7056 - task: type: Automatic Speech Recognition name: speech-recognition dataset: name: VoxPopuli type: facebook/voxpopuli config: es split: test args: language: es metrics: - name: WER type: wer value: 8.3668 - name: Cer type: cer value: 5.479 --- # openai/whisper-medium-mix-es This model is a fine-tuned version of [openai/whisper-medium](https://huggingface.co/openai/whisper-medium) on the common_voice_11_0 dataset. It achieves the following results on the evaluation set: - Loss: 0.1344 - Wer: 6.3465 Using the script provided in the Whisper Sprint (Dec. 2022) the models achieves these results on the test sets (WER): - **google/fleurs: 4.0266 %** (python [run_eval_whisper_streaming.py](https://github.com/huggingface/community-events/blob/main/whisper-fine-tuning-event/run_eval_whisper_streaming.py) --model_id="deepdml/whisper-medium-mix-es" --dataset="google/fleurs" --config="es_419" --device=0 --language="es") - **facebook/multilingual_librispeech: 4.6644 %** (python run_eval_whisper_streaming.py --model_id="deepdml/whisper-medium-mix-es" --dataset="facebook/multilingual_librispeech" --config="spanish" --device=0 --language="es") - **facebook/voxpopuli: 8.3668 %** (python run_eval_whisper_streaming.py --model_id="deepdml/whisper-medium-mix-es" --dataset="facebook/voxpopuli" --config="es" --device=0 --language="es") ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data Training data used: - **mozilla-foundation/common_voice_11_0:** es, train+validation - **google/fleurs:** es_419, train - **facebook/multilingual_librispeech:** spanish, train - **facebook/voxpopuli:** es, train Evaluating over test split from above datasets. ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 32 - eval_batch_size: 16 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 500 - training_steps: 5000 - mixed_precision_training: Native AMP ### Training results | Training Loss | Epoch | Step | Validation Loss | Wer | |:-------------:|:-----:|:----:|:---------------:|:------:| | 0.266 | 0.2 | 1000 | 0.1657 | 8.0395 | | 0.1394 | 0.4 | 2000 | 0.1539 | 7.3937 | | 0.1316 | 0.6 | 3000 | 0.1452 | 6.9656 | | 0.1165 | 0.8 | 4000 | 0.1392 | 6.5765 | | 0.2816 | 1.0 | 5000 | 0.1344 | 6.3465 | ### Framework versions - Transformers 4.26.0.dev0 - Pytorch 1.13.0+cu117 - Datasets 2.7.1.dev0 - Tokenizers 0.13.2