metadata
language: es
datasets:
- common_voice
metrics:
- wer
- cer
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning-week
license: apache-2.0
Wav2Vec2-Large-XLSR-53-Spanish-With-LM
This is a model copy of Wav2Vec2-Large-XLSR-53-Spanish that has language model support.
This model card can be seen as a demo for the pyctcdecode integration with Transformers led by this PR. The PR explains in-detail how the integration works.
In a nutshell: This PR adds a new Wav2Vec2WithLMProcessor class as drop-in replacement for Wav2Vec2Processor.
The only change from the existing ASR pipeline will be:
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from datasets import load_dataset
ds = load_dataset("common_voice", "es", split="test", streaming=True)
sample = next(iter(ds))
model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
input_values = processor(sample["audio"]["array"], return_tensors="pt").input_values
logits = model(input_values).logits
prediction_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(prediction_ids)
print(transcription)
Model | WER | CER |
---|---|---|
jonatasgrosman/wav2vec2-large-xlsr-53-spanish | 8.81% | 2.70% |
pcuenq/wav2vec2-large-xlsr-53-es | 10.55% | 3.20% |
facebook/wav2vec2-large-xlsr-53-spanish | 16.99% | 5.40% |
mrm8488/wav2vec2-large-xlsr-53-spanish | 19.20% | 5.96% |