|
--- |
|
language: es |
|
datasets: |
|
- common_voice |
|
metrics: |
|
- wer |
|
- cer |
|
tags: |
|
- audio |
|
- automatic-speech-recognition |
|
- speech |
|
- xlsr-fine-tuning-week |
|
license: apache-2.0 |
|
--- |
|
|
|
# Wav2Vec2-Large-XLSR-53-Spanish-With-LM |
|
|
|
This is a model copy of [Wav2Vec2-Large-XLSR-53-Spanish](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-spanish) |
|
that has language model support. |
|
|
|
This model card can be seen as a demo for the [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) integration |
|
with Transformers led by [this PR](https://github.com/huggingface/transformers/pull/14339). The PR explains in-detail how the |
|
integration works. |
|
|
|
In a nutshell: This PR adds a new Wav2Vec2WithLMProcessor class as drop-in replacement for Wav2Vec2Processor. |
|
|
|
The only change from the existing ASR pipeline will be: |
|
|
|
```diff |
|
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor |
|
from datasets import load_dataset |
|
|
|
ds = load_dataset("common_voice", "es", split="test", streaming=True) |
|
|
|
sample = next(iter(ds)) |
|
|
|
model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm") |
|
processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm") |
|
|
|
input_values = processor(sample["audio"]["array"], return_tensors="pt").input_values |
|
|
|
logits = model(input_values).logits |
|
prediction_ids = torch.argmax(logits, dim=-1) |
|
|
|
transcription = processor.batch_decode(prediction_ids) |
|
|
|
print(transcription) |
|
``` |
|
|
|
|
|
| Model | WER | CER | |
|
| ------------- | ------------- | ------------- | |
|
| jonatasgrosman/wav2vec2-large-xlsr-53-spanish | **8.81%** | **2.70%** | |
|
| pcuenq/wav2vec2-large-xlsr-53-es | 10.55% | 3.20% | |
|
| facebook/wav2vec2-large-xlsr-53-spanish | 16.99% | 5.40% | |
|
| mrm8488/wav2vec2-large-xlsr-53-spanish | 19.20% | 5.96% | |
|
|