patrickvonplaten
/

wav2vec2-large-xlsr-53-spanish-with-lm

Automatic Speech Recognition

xlsr-fine-tuning-week

Inference Endpoints

Model card Files Files and versions Community

wav2vec2-large-xlsr-53-spanish-with-lm / README.md

patrickvonplaten's picture

patrickvonplaten

Update README.md

43b9e58 almost 3 years ago

|

1.79 kB

	---
	language: es
	datasets:
	- common_voice
	metrics:
	- wer
	- cer
	tags:
	- audio
	- automatic-speech-recognition
	- speech
	- xlsr-fine-tuning-week
	license: apache-2.0
	---

	# Wav2Vec2-Large-XLSR-53-Spanish-With-LM

	This is a model copy of [Wav2Vec2-Large-XLSR-53-Spanish](https://huggingface.co/jonatasgrosman/wav2vec2-large-xlsr-53-spanish)
	that has language model support.

	This model card can be seen as a demo for the [pyctcdecode](https://github.com/kensho-technologies/pyctcdecode) integration
	with Transformers led by [this PR](https://github.com/huggingface/transformers/pull/14339). The PR explains in-detail how the
	integration works.

	In a nutshell: This PR adds a new Wav2Vec2WithLMProcessor class as drop-in replacement for Wav2Vec2Processor.

	The only change from the existing ASR pipeline will be:

	```diff
	from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
	from datasets import load_dataset

	ds = load_dataset("common_voice", "es", split="test", streaming=True)

	sample = next(iter(ds))

	model = Wav2Vec2ForCTC.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")
	processor = Wav2Vec2Processor.from_pretrained("patrickvonplaten/wav2vec2-large-xlsr-53-spanish-with-lm")

	input_values = processor(sample["audio"]["array"], return_tensors="pt").input_values

	logits = model(input_values).logits
	prediction_ids = torch.argmax(logits, dim=-1)

	transcription = processor.batch_decode(prediction_ids)

	print(transcription)
	```


	\| Model \| WER \| CER \|
	\| ------------- \| ------------- \| ------------- \|
	\| jonatasgrosman/wav2vec2-large-xlsr-53-spanish \| 8.81% \| 2.70% \|
	\| pcuenq/wav2vec2-large-xlsr-53-es \| 10.55% \| 3.20% \|
	\| facebook/wav2vec2-large-xlsr-53-spanish \| 16.99% \| 5.40% \|
	\| mrm8488/wav2vec2-large-xlsr-53-spanish \| 19.20% \| 5.96% \|