nvidia
/

stt_hr_conformer_transducer_large

nljubesi commited on Dec 16, 2023

Commit

0b588e1

1 Parent(s): 7af1b98

Adding links to the ParlaSpeech dataset / paper (#1)

- Adding links to the ParlaSpeech dataset / paper (00343de3bc3c9f2d67b19610b6ef4d767298357b)

Co-authored-by: Nikola Ljubešić <[email protected]>

Files changed (1) hide show

README.md CHANGED Viewed

@@ -95,7 +95,7 @@ Full config can be found inside the `.nemo` files.
 ### Datasets
-All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset, which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
 ## Performance
@@ -117,4 +117,10 @@ Since the model is trained just on ParlaSpeech-HR v1.0 dataset, the performance
 - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
-- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)

 ### Datasets
+All the models in this collection are trained on ParlaSpeech-HR v1.0 Croatian dataset [4,5], which contains around 1665 hours of training data, 2.2 hours of development and 2.3 hours of test data after data cleaning.
 ## Performance
 - [2] [Google Sentencepiece Tokenizer](https://github.com/google/sentencepiece)
+- [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
+- [4] [ParlaSpeech-HR dataset](http://hdl.handle.net/11356/1494)
+- [5] [ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus](https://aclanthology.org/2022.parlaclarin-1.16/)
+  -