patrickvonplaten's picture
Update README.md
9901e0b
|
raw
history blame
2.55 kB
---
language: es
datasets:
- common_voice
metrics:
- wer
- cer
tags:
- audio
- automatic-speech-recognition
- speech
- xlsr-fine-tuning-week
license: apache-2.0
---
# Wav2Vec2-Large-XLSR-53-Spanish-With-LM
This is a model copy of [Wav2Vec2-Large-XLSR-53-Spanish](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-spanish)
that has language model support.
This model card can be seen as a demo for the [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) integration
with Transformers led by [this PR](https://github.com/huggingface/transformers/pull/14339). The PR explains in-detail how the
integration works.
In a nutshell: This PR adds a new Wav2Vec2WithLMProcessor class as drop-in replacement for Wav2Vec2Processor.
The only change from the existing ASR pipeline will be:
```diff
import torch
import torchaudio.functional as F
-from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
+from transformers import Wav2Vec2ForCTC, Wav2Vec2ProcessorWithLM
from datasets import load_dataset
ds = load_dataset("common_voice", "es", split="test", streaming=True)
sample = next(iter(ds))
resampled_audio = F.resample(torch.tensor(sample["audio"]["array"]), 48_000, 16_000).n
model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
-processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
+processor = Wav2Vec2ProcessorWithLM.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
input_values = processor(resampled_audio, return_tensors="pt").input_values
with torch.no_grad():
logits = model(input_values).logits
-prediction_ids = torch.argmax(logits, dim=-1)
-transcription = processor.batch_decode(prediction_ids)
+transcription = processor.batch_decode(logits.cpu().numpy()).text
print(transcription)
```
**Improvement**
This model has been compared on 512 speech samples from the Spanish Common Voice Test set and
gives a nice *20 %* performance boost:
The results can be reproduced by running *from this model repository*:
| Model | WER | CER |
| ------------- | ------------- | ------------- |
| patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm | **8.44%** | **2.93%** |
| jonatasgrosman/wav2vec2-large-xlsr-53-spanish | **10.20%** | **3.24%** |
```
bash run_ngram_wav2vec2.py 1 512
```
```
bash run_ngram_wav2vec2.py 0 512
```
with `run_ngram_wav2vec2.py` being
https://huggingface.co/patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm/blob/main/run_ngram_wav2vec2.py