File size: 1,230 Bytes
633a9e7 96f8bab f471100 2edbb03 f471100 d118fdd 633a9e7 57c08e8 633a9e7 99a28ed |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
---
language: hr
datasets:
- parlaspeech
tags:
- audio
- automatic-speech-recognition
widget:
- example_title: example 1
src: https://huggingface.co/classla/wav2vec2-xls-r-sabor-hr/raw/main/00020570a.flac.wav
- example_title: example 2
src: https://huggingface.co/classla/wav2vec2-xls-r-sabor-hr/raw/main/00020578b.flac.wav
---
# wav2vec2-xls-r-sabor-hr
This model is based on the [facebook/wav2vec2-xls-r-300m model](https://huggingface.co/facebook/wav2vec2-xls-r-300m) and was fine-tuned over 72 hours of recordings and transcripts from the Croatian parliament. These transcripts are an early result of the second iteration of the [ParlaMint project](https://www.clarin.eu/content/parlamint-towards-comparable-parliamentary-corpora) and will be extended and published under a permissive license.
These efforts were coordinated by Nikola Ljubešić, the manual data alignment was performed by Ivo-Pavao Jazbec, the method from [Plüss et al](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, while the final modelling was performed by Peter Rupnik.
Initial evaluation on partially noisy data showed the model to achieve a word error rate of 13.68% and a character error rate of 4.56%. |