Model Description
The Wav2vec2 base model facebook/wav2vec2-base-960h fine tuned on phoneme recognition task for the dutch language.
Usage
To transcribe in phonemes audio files the model can be used as a standalone acoustic model as follows:
from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC
from datasets import load_dataset
import torch
# load model and tokenizer
processor = Wav2Vec2Processor.from_pretrained("Clementapa/wav2vec2-base-960h-phoneme-reco-dutch")
model = Wav2Vec2ForCTC.from_pretrained("Clementapa/wav2vec2-base-960h-phoneme-reco-dutch")
# load dummy dataset and read soundfiles
ds = load_dataset("common_voice", "nl", split="validation")
# tokenize
input_values = processor(ds[0]["audio"]["array"], return_tensors="pt", padding="longest").input_values # Batch size 1
# retrieve logits
logits = model(input_values).logits
# take argmax and decode
predicted_ids = torch.argmax(logits, dim=-1)
transcription = processor.batch_decode(predicted_ids)
- Downloads last month
- 155
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train Clementapa/wav2vec2-base-960h-phoneme-reco-dutch
Space using Clementapa/wav2vec2-base-960h-phoneme-reco-dutch 1
Evaluation results
- Test PER on CommonVoice (clean)test set self-reported20.830
- Val PER on CommonVoice (clean)test set self-reported16.180