Update README.md
Browse files
README.md
CHANGED
@@ -17,16 +17,18 @@ widget:
|
|
17 |
|
18 |
# wav2vec2-large-slavic-parlaspeech-hr-lm
|
19 |
|
20 |
-
This model for Croatian ASR is based on the [facebook/wav2vec2-large-slavic-voxpopuli-v2 model](facebook/wav2vec2-large-slavic-voxpopuli-v2) and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) and enhanced with a language model.
|
21 |
-
|
22 |
-
The efforts resulting in this model were coordinated by Nikola Ljubešić, the rough manual data alignment was performed by Ivo-Pavao Jazbec, the method for fine automatic data alignment from [Plüss et al.](https://arxiv.org/abs/2010.02810) was applied by Vuk Batanović and Lenka Bajčetić, the transcripts were normalised by Danijel Korzinek, while the final modelling was performed by Peter Rupnik.
|
23 |
|
24 |
If you use this model, please cite the following paper:
|
25 |
|
26 |
-
Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus.
|
|
|
|
|
27 |
|
28 |
## Metrics
|
29 |
|
|
|
|
|
30 |
|split|CER|WER|
|
31 |
|---|---|---|
|
32 |
|dev|0.0253|0.0556|
|
|
|
17 |
|
18 |
# wav2vec2-large-slavic-parlaspeech-hr-lm
|
19 |
|
20 |
+
This model for Croatian ASR is based on the [facebook/wav2vec2-large-slavic-voxpopuli-v2 model](https://huggingface.co/facebook/wav2vec2-large-slavic-voxpopuli-v2) and was fine-tuned with 300 hours of recordings and transcripts from the ASR Croatian parliament dataset [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) and enhanced with a 5-gram language model based on the [ParlaMint dataset](http://hdl.handle.net/11356/1432).
|
|
|
|
|
21 |
|
22 |
If you use this model, please cite the following paper:
|
23 |
|
24 |
+
Nikola Ljubešić, Danijel Koržinek, Peter Rupnik, Ivo-Pavao Jazbec. ParlaSpeech-HR -- a freely available ASR dataset for Croatian bootstrapped from the ParlaMint corpus. Accepted at ParlaCLARIN@LREC.
|
25 |
+
|
26 |
+
There are similarly performing models available, one [that does not use a language model](https://huggingface.co/classla/wav2vec2-slavic-parlaspeech-hr) and [another that is based on the XLS-R model](https://huggingface.co/classla/wav2vec2-xls-r-parlaspeech-hr).
|
27 |
|
28 |
## Metrics
|
29 |
|
30 |
+
Evaluation is performed on the dev and test portions of the [ParlaSpeech-HR v1.0](http://hdl.handle.net/11356/1494) dataset.
|
31 |
+
|
32 |
|split|CER|WER|
|
33 |
|---|---|---|
|
34 |
|dev|0.0253|0.0556|
|